The Principles 
of 


Quantum Mechanics 


TEE PRINGLE 
OF 


QUANTUM MECHANICS 


P An MV DIRAGC 


FELLOW OF ST. JOHNS COLLEGE, CAMBRIDGE 
IN THE UNIVERSITY OF CAMBRIDGE 


SECOND EDITION 


OXFORD 
AT THE CLARENDON PRESS 


OXFORD UNIVERSITY PRESS 
AMEN HOusE, LONDON E.C. 4 
LONDON EDINBURGH GLASGOW 
NEW YORK TORONTO MELBOURNE 
CAPETOWN BOMBAY CALCUTTA 
MADRAS SHANGHAI 


HUMFREY MILFORD 
Publisher to the University 


Cataloguing in Publication Data [adapted from the 4th edition revised] 
Dirac, P. A. M. 
The Principles of Quantum Mechanics.—2nd ed.— 
(The International series of monographs on physics) 
1. Quantum theory 
[. Title II. Series 
530.72 QC174.12 


PRINTED IN GREAT BRITAIN 


PREFACE TO THE SECOND EDITION 


HE book has been mostly rewritten. I have tried by carefully overhauling 

the method of presentation to give the development of the theory in a rather 
less abstract form, without making any sacrifices in exactness of expression or in 
the logical character of the development. This should make the work suitable for 
a wider circle of readers, although the reader who likes abstractness for its own 
sake may possibly prefer the style of the first edition. 

The main change has been brought about by the use of the word ‘state’ in 
a three-dimensional non-relativistic sense. It would seem at first sight a pity 
to build up the theory largely on the basis of non-relativistic concepts. The use 
of the non-relativistic meaning of ‘state’, however, contributes so essentially to 
the possibilities of clear exposition as to lead one to suspect that the fundamental 
ideas of the present quantum mechanics are in need of serious alteration at 
just this point, and that an improved theory would agree more closely with 
the development here given than with a development which aims at preserving 
the relativistic meaning of ‘state’ thoughout. 

Some mistakes which have been kindly pointed out to me by friends have been 
corrected and some new subject-matter has been inserted, the largest addition 
being a chapter on field theory. 


P. A. M. D. 


THE INSTITUTE FOR ADVANCED STUDY, 
PRINCETON. 
27 November 1934 


vi 


FROM THE PREFACE TO THE FIRST EDITION 


HE methods of progress in theoretical physics have undergone a vast change 

during the twentieth century. The classical tradition has been to consider 
the world to be an association of observable objects (particles, fluids, 
fields, &c.) moving about according to definite laws of force, so that one could 
form a mental picture in space and time of the whole scheme. This led to 
a physics whose aim was to make assumptions about the mechanism and forces 
connecting these observable objects, to account for their behaviour in the simplest 
possible way. It has become increasingly evident in recent times, however, 
that nature works on a different plan. Her fundamental laws do not govern 
the world as it appears in our mental picture in any very direct way, but instead 
they control a substratum of which we cannot form a mental picture without 
introducing irrelevancies. The formulation of these laws requires the use of 
the mathematics of transformations. The important things in the world appear 
as the invariants (or more generally the nearly invariants, or quantities with 
simple transformation properties) of these transformations. The things we are 
immediately aware of are the relations of these nearly invariants to a certain frame 
of reference, usually one chosen so as to introduce special simplifying features 
which are unimportant from the point of view of general theory. 

The growth of the use of transformation theory, as applied first to relativity and 
later to the quantum theory, is the essence of the new method in theoretical physics. 
Further progress lies in the direction of making our equations invariant under wider 
and still wider transformations. This state of affairs is very satisfactory from 
a philosophical point of view, as implying an increasing recognition of the part 
played by the observer, by observing,’ introducing the regularities that appear 
in the observations, and a lack of arbitrariness in the ways of nature, but it 
makes things less easy for the learner of physics. The new theories, if one looks 
apart from their mathematical setting, are built up from physical concepts which 
cannot be explained in terms of things previously known to the student, which 
cannot even be explained adequately in words at all. Like the fundamental 
concepts (e.g. proximity, identity) which every one must learn on? one’s arrival into 
the world, the newer concepts of physics can be mastered only by long familiarity 
with their properties and uses. 

From the mathematical side the approach to the new theories presents 
no difficulties, as the mathematics required (at any rate that which is required for 
the development of physics up to the’ ‘early twentieth century’) is not essentially 
different from what had been current for a considerable time. Mathematics is 


‘lin himself’ replaced with ‘by observing’ and ‘his’ observations becoming ‘the’ observations. 
‘his’ replaced by ‘one’s’| 
3[‘present’ substituted by ‘early twentieth century’ 


FROM THE PREFACE TO THE FIRST EDITION vii 


the tool specially suited for dealing with abstract concepts of any kind and there 
is no limit to its power in this field. For this reason a book on the new physics, 
if not purely descriptive of experimental work, must be essentially mathematical. 
All the same the mathematics is only a tool and one should learn to hold 
the physical ideas in one’s mind without reference to the mathematical form. 
In this book I have tried to keep the physics to the forefront, by beginning with 
an entirely physical chapter and in the later work examining the physical meaning 
underlying the formalism wherever possible. The amount of theoretical ground 
one has to cover before being able to solve problems of real practical value is 
rather large, but this circumstance is an inevitable consequence of the fundamental 
part played by transformation theory and is likely to become more pronounced in 
the theoretical physics of the future. 

With regard to the mathematical form in which the theory can be presented, 
an author must decide at the outset between two methods. There is 
the symbolic method, which deals directly in an abstract way with the quantities 
of fundamental importance (the invariants, &c., of the transformations) and 
there is the method of coordinates or representations, which deals with sets of 
numbers corresponding to these quantities. The second of these has usually 
been used for the presentation of quantum mechanics (in fact it has been used 
practically exclusively with the exception of Hermann Weyl’s book Gruppentheorie 
und Quantenmechanik.!) It is known under one or other of the two names 
‘Wave Mechanics’ and ‘Matrix Mechanics’ according to which physical things 
receive emphasis in the treatment, the states of a system or its dynamical variables. 
It has the advantage that the kind of mathematics required is more familiar to 
the average student, and also it is the historical method. 

The symbolic method, however, seems to go more deeply into the nature 
of things. It enables one to express the physical laws in a neat and concise way, 
and will probably be increasingly used in the future as it becomes better 
understood and its own special mathematics gets developed. For this reason I have 
chosen the symbolic method, introducing the representatives later merely as an aid 
to practical calculation. This has necessitated a complete break from the historical 
line of development, but this break is an advantage through enabling the approach 
to the new ideas to be as direct as possible. I have given the connexion between 
the new theory and Niels Bohr’s orbit theory, because the latter is likely to be 


useful in an elementary way for a long time to come. 
P. A. M. D. 


ST. JOHN’S COLLEGE, CAMBRIDGE 
29 May 1930 


II| The theory of groups and quantum mechanics by Hermann Weyl, second edition translated 
1932 by Howard Percy Robertson 1903-1961 Library of Congress Control Number 32-2928] 


Contents 


PREFACE TO-THE SECOND: EDITION: 03 yx ie ae ae Soa eek 
FROM THE PREFACE TO THE FIRST EDITION ........... 


I. THE PRINCIPLE OF SUPERPOSITION 
1. The Need for a Quantum Theory ..............-.....-. 
2. The Polarization of Photons... ......... 002.000 2 eee 
3. Interference of Photons .......... 0.00. eee ee eee 
4. Superposition and Indeterminacy .................048.4 
5. Mathematical Formulation of the Principle. .............. 
6: Atial ysis vor. the Principle: eis. <a 906 doh de tk ek a A a ae 


II. STATES AND OBSERVABLES 
7. The Vector Space representing the States ............... 
8. Observables as Linear Operators ............ 0.000000 
Op Bigenvalese a. ¢ sais et a sos ae Bh SO oe Be 2 ae 
10.7 he Expansion! héoretite 2 3 oddest otk ek Ce eR Se ee 
11. Functions of an Observable ..........0. 2.00000 ee eee 
12. The General Physical Interpretation ................2.4. 
13. Commutability and Compatibility ................... 


III. REPRESENTATION THEORY FOR DISCRETE 

EIGENVALUES 

14) “he Bracket Notation < 2.9 2 2 e.e ¢ 4 Ge 4 ee GS 4 le ee ee 
Tos Nati Gains. async 8 vee os A Res MAS ee ed 
16. Eipen=1)" SAS» BaSi@ ys ic° arog eo te Cee ok ee a dR ew 
Lifes LPans OLA OU, bNEOIY* sexu cea se Ges OER Soe ee Bt ee 
18:-Probability Amplmudes: «sm 3: 2-9 aa eo Gee ae Y 6.4 Se G4 ee 
1 Oe TEAC, cas a eae Save ec ne Gee, & Ee cee Spit ee eo ee Se Gee 


ITV. REPRESENTATION THEORY FOR CONTINUOUS 
EIGENVALUES 
20. Introduction of the 6 function ..............0.....0084 


CONTENTS 


21. Properties of the 6 function ..............2..-00 0000. 
22. Representations with One Continuous Parameter ........... 
23. General Representations ...... 2... 00. ee ee ee 
24) ie. Weight: Pun CiQns og. Sh Sead ce deen wow ona he Gee 


V. THE QUANTUM CONDITIONS 
25: -P Olsson s Brackets) io ies Se oe ee Qa a Bee Ge ae ae 8 
26. Canonical Coordinates and Momenta................0.. 
27. Momenta as Differential Operators ..............-..24. 
28. Heisenberg’s Principle of Uncertainty .................. 
28: Displacenient’ Operators... sc-s.:41a! ese Shave a Mee veld ah a eee 
50: Contact “Granstormations:. 2.22%. .6 he: Bot need Pate G8 esac wt kh S 


VI. THE EQUATIONS OF MOTION 
31. Schrédinger’s Form for the Equations of Motion. ........... 
32. Heisenberg’s Form for the Equations of Motion ............ 
oo:. LHe Acton Primici ple: +t 6-4 Sac Bp xewca Wk  ela ee a ee ee 
34. The Motion of Wave Packets ...............-22.0200. 
Oe. HE BROS PAROLE: a. ire 4 bates ct wad Neca ak hy a? eo enn SB eae ea ee 
36, Lhe Harmonie: Oscillator. on ace 2 x ee RG oR Be he ae ed 
Oi ie Gib SB Nee Ma IE 2 goss ae teak i ae ack Chae ak ee ee dee oe te Se 


VII. MOTION IN A CENTRAL FIELD OF FORCE 
38. Introduction of the Angular Momentum ................ 
39. Properties of Angular Momentum. ................... 
40. Transition to Polar Coordinates... ............2.0240-. 
41. Energy-levels of the Hydrogen Atom ..............-.24. 
AD NCL CTIO ME: FAL LOG: it Nt gs ahs teh Se ge od Seri, nae ee ee Ah oh Ne ye hes GY 
43. The Zeeman Effect for the Hydrogen Atom .............. 
44. Combination of Angular Momenta .................0.. 


VIII. PERTURBATION THEORY 
W5 . Aen ral Remarks sn. s fe soak uk eH ewe eR oP oe Be Pe He EL 
46. The change in the energy-levels caused by a perturbation ....... 
47. The perturbation considered as causing transitions .......... 
AS, Application: to AadialiOw > 27/34 'e 4250 gore eects boo fete tte fe 
49. Transitions caused by a Perturbation Independent of the Time... . 
50. The Anomalous Zeeman Effect ...............2..2004 


x CONTENTS 
IX. COLLISION PROBLEMS 179 
Sia (eneraliremagrks: yu od eels ee ee ae a ee ee ee 8 179 
52. The Scattering Coefficient ............-. 222000022 ae 182 
53. Solution with the p-Representation................00. 186 
54. Dispersive SCatberin® ve cio a: eg Goon ge pe ee ee ee a 192 
5b; Resonance Cat tenine <a sc4 i Bae Ele big ee we ee ee ed 194 
56. Emission and Absorption ...............0.0 00000 e 197 
X. SYSTEMS CONTAINING SEVERAL SIMILAR PARTICLES 200 
57. Symmetrical and Antisymmetrical States... ............. 200 
58. Permutations as Dynamical Variables... ............... 204 
59. Permutations as Constants of the Motion ............... 206 
60. Determination of the Energy-levels .........0..0.02...000. 209 
61, Application to Eleet@onss 2: 2 ¥.sec 2.5. sheen b see 4 ac Set Se 213 
XI. THEORY OF RADIATION 219 
62. Second Quantization... 2... 219 
63. Waves and Einstein-Bose Particles .................00.4 224 
64. Application to Photons ......... 0.0000. eee eee 227 
65. Determination of the Interaction Energy between a Photon and an Atom 230 
66. Emission, Absorption and Scattering of Radiation ........... 259 
67. Einstein’s Laws of Radiation ............. 0.000002 ae 23.6 
XII. RELATIVISTIC THEORY OF THE ELECTRON 240 
68. Relativistic Treatment of a Particle. ...............004 240 
69. The Wave Equation for the Electron ................00.4 241 
70. Invariance under a Lorentz Transformation. .............. 244 
71. The Motion of a Free Electron ............0. 0000s 248 
(22 ERIStenCe OL (Ne SPIN 2% 2 a ie ek ee 2 hla 2 db ae She et ae See 251 
73. Transition to-Polar Variables. sc. 6 sep oe Se ee 254 
74. The Fine-Structure of the Energy-Levels of Hydrogen. ........ 256 
dos Vheory-ok THE POSItTOM, 6a 2-3-2. we ee, SR we ph ee Gee Se Ss 259 
XIII. FIELD THEORY 262 
76. Quantum Conditions for the Electromagnetic Field .........2.. 262 
77. Quantum Conditions for the Electromagnetic Potentials ....... 268 
78. The Supplementary Conditions ................00004 22 
79. Interaction of Field and Particles .................06. 276 
80. The Quantization of Electron Waves ..............0004 281 


81. 


Gonclusion: i: a ae a oe ee ee a oe ea oe el 286 


CONTENTS 


Index 


XI 


289 


Xi 


CONTENTS 


I. THE PRINCIPLE OF 


SUPERPOSITION 
1. The Need for a Quantum Theory 


CLASSICAL mechanics has been developed continuously from the time 
of Sir Isaac Newton and applied to an ever-widening range of dynamical 
systems, including, after the formalism is adapted to relativity requirements, 
the electromagnetic field in interaction with matter. The underlying ideas 
and the laws governing their application form a simple and elegant scheme, 
which one would be inclined to think could not be seriously modified without 
having all its attractive features spoilt. Nevertheless the passage to a new scheme, 
called quantum mechanics, which has been found to be necessary for the discussion 
of phenomena on the atomic scale and the new scheme is even more elegant and 
satisfying than the classical one. This is brought about by the fact that the changes 
which the new scheme requires are of a very profound character and do not clash 
with the shallower features that make the classical theory so attractive, as a result 
of which all these features can be taken over unchanged into the new scheme. 

The necessity for a departure from classical mechanics is clearly shown 
by experimental results. In the first place the forces known in classical 
electrodynamics are inadequate for the explanation of the remarkable stability 
of atoms and molecules, which is necessary in order that materials may have 
any definite physical and chemical properties at all. The introduction of 
new hypothetical forces will not save the situation, since there exist general 
principles of classical mechanics, holding for all kinds of forces, leading to 
results in direct? disagreement with observation. For example, if an atomic 
system has its equilibrium disturbed in any way and is then left alone, 
it will be set in oscillation and the oscillations will get impressed on 
the surrounding electromagnetic field, so that their frequencies may be observed 
with a spectroscope. Now whatever the laws of force governing the equilibrium, 
one would expect to be able to include the various frequencies in a scheme 
comprising certain fundamental frequencies and their harmonics. ‘This is not 
observed to be the case. Instead, there is observed a new and unexpected connexion 
between the frequencies, called Ritz’s Combination Law of Spectroscopy? which is 
quite unintelligible from the classical standpoint. 


The second edition has ‘violent’ which is replaced by ‘direct’ in the third edition.] 
3[‘On a New Law of Series Spectra,’ Ritz, W. The Astrophysical Journal 28 (1908) p. 237 
| https: //ui-adsabs.harvard.edu/abs/1908ApJ....28..237R | doi: 3 10.1086/141591 | 


combination law 


One might try to get over the difficulty without departing from classical 
mechanics by assuming each of the spectroscopically observed frequencies to be 
a fundamental frequency with its own degree of freedom, the laws of force being 
such that the harmonic vibrations do not occur. Such a theory will not do, however, 
even apart from the fact that it would give no explanation of the Combination Law, 
since it would immediately bring one into conflict with the experimental evidence 
on specific heats. Classical statistical mechanics enables one to establish a general 
connexion between the total number of degrees of freedom of an assembly of 
vibrating systems and its specific heat. If one assumes all the spectroscopic 
frequencies of an atom to correspond to different degrees of freedom, one would get 
a specific heat for any kind of matter incomparably greater than the observed value. 
In fact the observed specific heats at ordinary temperatures are given fairly well 
by a theory that takes into account merely the motion of each atom as a whole 
and assigns no internal motion to it at all. 

This leads us to a new clash between classical mechanics and the results 
of experiment. There must certainly be some internal motion in an atom 
to account for its spectrum, but the internal degrees of freedom, for some classically 
inexplicable reason, do not contribute to the specific heat. A similar clash is 
found in connexion with the energy of oscillation of the electromagnetic field 
in a vacuum. Classical mechanics requires the specific heat corresponding to 
this energy to be infinite, but it is observed to be quite finite. A general conclusion 
from experimental results is that oscillations of high frequency do not contribute 
their classical quota to the specific heat. 

As another illustration of the failure of classical mechanics we may consider 
the behaviour of light. We have, on the one hand, the phenomena of interference 
and diffraction, which can be explained only on the basis of a wave theory; 
on the other, phenomena such as photo-electric emission and scattering by free 
electrons, which show that light is composed of small particles. These particles, 
which are called photons, have each a definite energy and momentum, depending 
on the frequency of the light, and appear to have just as real an existence as 
electrons, or any other particles known in physics. A fraction of a photon is 
never observed. 

Modern experiments have shown that this anomalous behaviour is not peculiar 
to light, but is quite general. All material particles have wave properties, which can 
be exhibited under suitable conditions. We have here a very surprising’ and general 
example of the breakdown of classical mechanics—not merely an inaccuracy in 
its laws of motion, but an inadequacy of its concepts to supply us with a description 
of atomic events. 


t[‘surprising’ replaces ‘striking’] 


1. The Need for a Quantum Theory 3 


The necessity to depart from classical ideas when one wishes to account for 
the ultimate structure of matter may be seen, not only from experimentally 
established facts, but also from general philosophical grounds. In a classical 
explanation of the constitution of matter, one would assume it to be made up 
of a large number of small constituent parts and one would postulate laws for 
the behaviour of these parts, from which the laws of the matter in bulk could 
be deduced. This would not complete the explanation, however, since the question 
of the structure and stability of the constituent parts is left untouched. To go 
into this question, it becomes necessary to postulate that each constituent part 
is itself made up of smaller parts, in terms of which its behaviour is to be 
explained. There is clearly no end to this procedure, so that one can never arrive 
at the ultimate structure of matter on these lines. So long as big and small are 
merely relative concepts, it is no help to explain the big in terms of the small. It is 
therefore necessary to modify classical ideas in such a way as to give an absolute 
meaning to size. 

At this stage it becomes important to remember that science is concerned only 
with observable things and that we can observe an object only by letting it interact 
with some outside influence. An act of observation is thus necessarily accompanied 
by some disturbance of the object observed. We may define an object to be 
big when the disturbance accompanying our observation of it may be neglected, 
and small when the disturbance cannot be neglected. This definition is in close 
agreement with the common meanings of big and small. 

It is usually assumed that, by being careful, we may cut down the disturbance 
accompanying our observation to any desired extent. The concepts of big 
and small are then purely relative and refer to the gentleness of our means 
of observation as well as to the object being described. In order to give 
an absolute meaning to size, such as is required for any theory of the ultimate 
structure of matter, it becomes necessary to assume that there is a limit to 
the fineness of our powers of observation and the smallness of the accompanying 
disturbance—a limit which is inherent in the nature of things and can never be 
surpassed by improved technique or increased skill on the part of the observer. 
If the object under observation is such that the unavoidable limiting disturbance 
is negligible, then the object is big in the absolute sense and we may apply classical 
mechanics to it. If, on the other hand, the limiting disturbance is not negligible, 
then the object is small in the absolute sense and we require a new theory for 
dealing with it. 

A consequence of the preceding discussion is that we must revise our ideas 
of causality. Causality applies only to a system which is left undisturbed. 
If a system is small, we cannot observe it without producing a serious disturbance 
and hence we cannot expect to find any causal connexion between the results 


causality 


state of 
polarization 


of our observations. There is thus an essential indeterminancy in the quantum 
theory, of a kind that has no analogue in the classical theory, where causality 
reigns supreme. The quantum theory does not enable us in general to calculate 
the result of an observation, but only the probability of our obtaining a particular 
result when we make the observation. 

The lack of determinancy in the quantum theory should not be considered as 
a thing to be regretted. It is necessary for a rational theory of the ultimate 
structure of matter. One of the most satisfactory features of the present 
quantum theory is that the differential equations that express the causality 
of classical mechanics do not get lost, but are all retained in symbolic form, 
and indeterminancy appears only in the application of these equations to the 
results of observations. 


2. The Polarization of Photons 


The discussion in the preceding section about the limit to the gentleness with which 
observations can be made and the consequent indeterminacy in the results of those 
observations does not provide any quantitative basis for the building up of quantum 
mechanics. For this purpose a new set of accurate laws of nature is required. 
One of the most fundamental and most drastic of these is the Principle of 
Superposition of States. We shall lead up to a general formulation of this principle 
through a consideration of some special cases, taking first the example provided 
by the polarization of light. 

It is known experimentally that when plane-polarized light is used for 
ejecting photo-electrons, there is a preferential direction for the electron emission. 
Thus the polarization properties of light are closely connected with its corpuscular 
properties and one must ascribe a polarization to the photons. One must consider, 
for instance, a beam of light plane-polarized in a certain direction as consisting of 
photons each of which is plane-polarized in that direction and a beam of circularly 
polarized light as consisting of photons each circularly polarized. Every photon 
is in a certain state of polarization, as we shall say. The problem we must now 
consider is how to fit in these ideas with the known facts about the resolution of 
light into polarized components and the recombination of these components. 

Let us take a definite case. Suppose we have a beam of light passing through 
a crystal of tourmaline, which has the property of letting through only light 
plane-polarized perpendicular to its optic axis. Classical electrodynamics tells 
us what will happen for any given polarization of the incident beam. If this beam 
is polarized perpendicular to the optic axis, it will all go through the crystal; 
if parallel to the axis, none of it will go through; while if polarized at an angle a to 
the axis, a fraction sin?a will go through. How are we to understand these results 
on a photon basis? 


2. The Polarization of Photons 5 


A beam that is plane-polarized in a certain direction is to be pictured as 
made up of photons each plane-polarized in that direction. This picture leads 
to no difficulty in the cases when our incident beam is polarized perpendicular or 
parallel to the optic axis. We merely have to suppose that each photon polarized 
perpendicular to the axis passes unhindered and unchanged through the crystal, 
while each photon polarized parallel to the axis is stopped and absorbed. 
A difficulty arises, however, in the case of the obliquely polarized incident beam. 
Each of the incident photons is then obliquely polarized and it is not clear what 
will happen to such a photon when it reaches the tourmaline. 

A question about what will happen to a particular photon under certain 
conditions is not really very precise. To make it precise one must imagine some 
experiment performed having a bearing on the question and inquire what will be 
the result of the experiment. Only questions about the results of experiments 
have a real significance and it is only such questions that theoretical physics has 
to consider. 

In our present example the obvious experiment is to use an incident beam 
consisting of only a single photon and to observe what appears on the back 
side of the crystal. If one does this experiment, then according to quantum 
mechanics sometimes one will find a whole photon, of energy equal to the energy 
of the incident photon, on the back side and other times one will find nothing. 
When one finds a whole photon, it will be polarized perpendicular to the optic axis. 
One will never find only a part of a photon on the back side. If one repeats 
the experiment a large number of times, one will find the photon on the back side in 
a fraction sin’a of the total number of times. Thus we may say that the photon has 
a probability sin’a of passing through the tourmaline and appearing on the back 
side polarized perpendicular to the axis and a probability cos?a of being absorbed. 
These values for the probabilities lead to the correct classical results for an incident 
beam containing a large number of photons. 

In this way we preserve the individuality of the photon in all cases. 
We are able to do this, however, only because we abandon the determinacy 
of the classical theory. The result of an experiment is not determined, 
as it would be according to classical ideas, by the conditions under the control 
of the experimenter. The most that can be predicted is a set of possible results, 
with a probability of occurrence for each. 

The foregoing discussion about the result of an experiment with a single 
obliquely polarized photon incident on a crystal of tourmaline answers all that 
can legitimately be asked about what happens to an obliquely polarized photon 
when it reaches the tourmaline. Questions about what decides whether the photon 
is to go through or not and how it changes its direction of polarization when 
it does go through cannot be investigated by experiment and should be regarded 


as outside the domain of science, at least for this discussion? Nevertheless some 
further description is necessary in order to correlate the results of this experiment 
with the results of other experiments that might be performed with photons and 
to fit them all into a general scheme. Such further description should be regarded, 
not as an attempt to answer questions outside the domain of science, but as an aid 
to the formulation of rules for expressing concisely the results of large numbers 
of experiments. 

The further description provided by quantum mechanics runs as follows. It is 
supposed that a photon polarized obliquely to the optic axis may be regarded 
as being partly in the state of polarization parallel to the axis and partly 
in the state of polarization perpendicular to the axis. The state of oblique 
polarization may be considered as the result of some kind of superposition process 
applied to the two states of parallel and perpendicular polarization. This implies 
a certain special kind of relationship between the various states of polarization, 
a relationship similar to that between polarized beams in classical optics, but which 
is now to be applied, not to beams, but to the states of polarization of one 
particular photon. This relationship allows any state of polarization to be 
resolved into, or expressed as a superposition of, any two mutually perpendicular 
states of polarization. 

When we make the photon meet a tourmaline crystal, we are subjecting it to 
an observation. We are observing whether it is polarized parallel or perpendicular 
to the optic axis. The effect of making this observation is to force the photon 
entirely into the state of parallel or entirely into the state of perpendicular 
polarization. It has to make a sudden jump from being partly in each of these 
two states to being entirely in one or other of them. Which of the two states it will 
jump into cannot be predicted, but is governed only by probability laws. If it jumps 
into the parallel state it gets absorbed and if it jumps into the perpendicular state 
it passes through the crystal and appears on the other side preserving this state 
of polarization. 


3. Interference of Photons 


In this section we shall deal with another example of superposition. We shall 
again take photons, but shall be concerned with their position in space and 
their momentum instead of their polarization. If we are given a beam of roughly 
monochromatic light, then we know something about the location and momentum 
of the associated photons. We know that each of them is located somewhere in 
the region of space through which the beam is passing and has a momentum in 
the direction of the beam of magnitude given in terms of the frequency of the beam 


[The limitation to this discussion has been added] 


3. Interference of Photons 7 


by Albert Einstein’s photo-electric law—momentum equals frequency multiplied Einstein’s 
by a universal constant! When we have such information about the location and electric law 


photo- 


momentum of a photon we shall say that it is in a definite state of motion. A state state of motion 


of motion is completely specified when one is given that it is associated with 
a certain beam. 

We shall discuss the description which quantum mechanics provides of 
the interference of photons. Let us take a definite experiment demonstrating 
interference. Suppose we have a beam of light which is passed through 
some kind of interferometer, so that it gets split up into two components 
and the two components are subsequently made to interfere. We may, as in 
the preceding section, take an incident beam consisting of only a single photon 
and inquire what will happen to it as it goes through the apparatus. This will 
present to us the difficulty of the conflict between the wave and corpuscular theories 
of light in an acute form. 

Corresponding to the description that we had in the case of the polarization, 
we must now describe the photon as going partly into each of the two components 
into which the incident beam is split. The photon is then, as we may say, in a state 
of motion given by the superposition of the two states of motion associated with 
the two components. We are thus led to a generalization of the term ‘state of 
motion’ applied to a photon. For a photon to be in a definite state of motion 
it need not be associated with one single beam of light, but may be associated 
with two or more beams of light which are the components into which one original 
beam has been split? In the accurate mathematical theory each state of motion 
is associated with one of the wave functions of ordinary wave optics, which wave 
function may describe either a single beam or two or more beams into which one 
original beam has been split. States of motion are thus superposable in a similar 
way to wave functions. 

Let us consider now what happens when we determine the energy in one of 
the components. The result of such a determination must be either the whole 
photon or nothing at all. Thus the photon must change suddenly from being 
partly in one beam and partly in the other to being entirely in one of the beams. 
This sudden change is due to the disturbance in the state of motion of the photon 


‘[Einstein, A. (1905). »Uber einen die Erzeugung und Verwandlung des 
Lichtes betreffenden heuristischen Gesichtspunkt“ Annalen Der Physik, 322(6), 
pp. 132-148  doi:10.1002 /andp.19053220607 English Translation: ‘On a_ Heuristic 


Point of View about the Creation and Conversion of Light’ by Wikisource 
{ https: //en.wikisource.org/?curid=59468 }] 

’The circumstance that the superposition idea requires us to generalize our original 
meaning of states of motion, but that no corresponding generalization was needed for 
the states of polarization of the preceding section, is an accidental one with no underlying 
theoretical significance. 


which the observation necessarily makes. It is impossible to predict in which of 
the two beams the photon will be found. Only the probability of either result can 
be calculated from the previous distribution of the photon over the two beams. 

One could carry out the energy measurement without destroying the component 
beam by, for example, reflecting the beam from a movable mirror and observing 
the recoil. Our description of the photon allows us to infer that, after such 
an energy measurement, it would not be possible to bring about any interference 
effects between the two components. So long as the photon is partly in one 
beam and partly in the other, interference can occur when the two beams are 
superposed, but this possibility disappears when the photon is forced entirely into 
one of the beams by an observation. The other beam then no longer enters into 
the description of the photon, which therefore counts as being entirely in the one 
beam in the ordinary way for any experiment that may subsequently be performed 
on it. 

On these lines quantum mechanics is able to effect a union of the wave and 
corpuscular properties of light. ‘The essential point is the association of each 
of the states of motion of a photon with one of the wave functions of ordinary 
wave optics. The nature of this association cannot be pictured on a basis of classical 
mechanics, but is something entirely new. It would be quite wrong to picture 
the photon and its associated wave as interacting in the way in which particles 
and waves can interact in classical mechanics. The association can be interpreted 
only statistically, the wave function giving us information about the probability 
of our finding the photon in any particular place when we make an observation of 
where it is. 

Some time before the discovery of quantum mechanics people realized that 
the connexion between light waves and photons must be of a statistical character. 
What they did not clearly realize, however, was that the wave function gives 
information about the probability of one photon being in a particular place and not 
the probable number of photons in that place. The importance of the distinction 
can be made clear in the following way. Suppose we have a beam of light consisting 
of a large number of photons split up into two components of equal intensity. 
On the assumption that the intensity of a beam is connected with the probable 
number of photons in it, we should have half the total number of photons going 
into each component. If the two components are now made to interfere, we should 
require a photon in one component to be able to interfere with one in the other. 
Sometimes these two photons would have to annihilate one another and other times 
they would have to produce four photons. This would contradict the conservation 
of energy. The new theory, which connects the wave function with probabilities 
for one photon, gets over the difficulty by making each photon go partly into each 


of the two components. Each photon then interferes only with itself. Interference 
between two different photons never occurs. 

The association of particles with waves discussed above is not restricted to 
the case of light, but is, according to modern theory, of universal applicability. 
All kinds of particles are associated with waves in this way and conversely all wave 
motion is associated with particles. Thus all particles can be made to exhibit 
interference effects and all wave motion has its energy in the form of quanta. 
The reason why these general phenomena are not more obvious is on account 
of a law of proportionality between the mass or energy of the particles and 
the frequency of the waves, the coefficient being such that for waves of familiar 
frequencies the associated quanta are extremely small, while for particles even 
as light as electrons the associated wave frequency is so high that it is not easy 
to demonstrate interference. 


4. Superposition and Indeterminacy 


The reader may possibly be feeling dissatisfied with the attempt in the two 
preceding sections to fit in the existence of photons with the classical 
theory of light. It may be argued? that a very strange idea has been 
introduced—the possibility of a photon being partly in each of two states of 
polarization, or partly in each of two separate beams—but even with the help of 
this strange idea no satisfying picture of the fundamental single-photon processes 
has been given. One may say® further that this strange idea did not provide any 
information about experimental results for the experiments discussed, beyond what 
could have been obtained from an elementary consideration of photons being 
guided in some vague way by waves. What, then, is the use of the strange idea? 

In answer to the first criticism it may be remarked that the main object 
of physical science is not the provision of pictures, but is the formulation of 
laws governing phenomena and the application of these laws to the discovery 
of new phenomena. If a picture exists, so much the better; but whether 
a picture exists or not is a matter of only secondary importance. In the case 
of atomic phenomena no picture can be expected to exist in the usual sense 
of the word ‘picture’, by which is meant a model functioning essentially 
on classical lines. One may, however, extend the meaning of the word 
‘picture’ to include any way of looking at the fundamental laws which makes 
their self-consistency obvious. With this extension, one may gradually acquire 
a picture of atomic phenomena by becoming familiar with the laws of 
the quantum theory. 


{Original:- He may argue] 
’[Original:- He may say] 


state 


10 


With regard to the second criticism, it may be remarked that for many simple 
experiments with light, an elementary theory of waves and photons connected in 
a vague statistical way would be adequate to account for the results. In the case 
of such experiments quantum mechanics has no further information to give. 
In the great majority of experiments, however, the conditions are too complex for 
an elementary theory of this kind to be applicable and some more elaborate scheme, 
such as is provided by quantum mechanics, is then needed. The method of 
description that quantum mechanics gives in the more complex cases is applicable 
also to the simple cases and although it is then not really necessary for accounting 
for the experimental results, its study in these simple cases is perhaps a suitable 
introduction to its study in the general case. 

Before we can discuss the principle of superposition in the general case, we must 
introduce the important concept of state of an atomic system. Let us take 
a general atomic system, composed of particles or bodies with specified properties 
(mass, moment of inertia, etc.) interacting according to specified laws of force. 
There will be various possible motions of the particles or bodies consistent 
with the laws of force. Each such motion is called a state of the system. 
According to classical ideas one could specify a state by giving numerical values 
to all the coordinates and velocities of the various component parts of the system 
at some instant of time, the whole motion being then completely determined. 
Now the argument about the disturbance of observation’ shows that we cannot 
observe a small system with that amount of detail which classical theory supposes. 
The limitation in the power of observation puts a limitation on the number of 
data that can be assigned to a state. Thus a state of an atomic system must be 
specified by fewer or more indefinite data than a complete set of numerical values 
for all the coordinates and velocities at some instant of time. In the case when 
the system is just a single photon, a state would be completely specified by a given 
state of motion in the sense of §3 together with a given state of polarization in 
the sense of §2. 

A state of a system may be defined as a motion that is restricted by as 
many conditions or data as are theoretically possible without mutual interference 
or contradiction. In practice the conditions could be imposed by a suitable 
preparation of the system, consisting perhaps in passing it through various kinds 
of sorting apparatus, such as slits and polarimeters. 

The general principle of superposition of quantum mechanics applies to 
the states, as thus defined, of any one dynamical system. It requires us to assume 
that between these states there exist peculiar relationships such that whenever 
the system is definitely in one state we can consider it as being partly in each 
of two or more other states. The original state must be regarded as the result 


tof p. 3] 


4. Superposition and Indeterminacy 11 


of a kind of superposition of the two or more new states, in a way that cannot 
be conceived on classical ideas. Any state may be considered as the result of 
a superposition of two or more other states, and indeed in an infinite number 
of ways. Conversely any two or more states may be superposed to give a new state. 
The procedure of expressing a state as the result of superposition of a number of 
other states is a mathematical procedure that is always permissible, independent 
of any reference to physical conditions, like the procedure of resolving a wave 
into Fourier components. Whether it is useful in any particular case, though, 
depends on the special physical conditions of the problem under consideration. 

In the two preceding sections examples were given of the superposition principle 
applied to a system consisting of a single photon. §2 dealt with states differing 
only with regard to the polarization and §3 with states differing only with regard 
to the motion of the photon as a whole. 

The nature of the relationships which the superposition principle requires 
to exist between the states of any system is of a kind that cannot be explained 
in terms of familiar physical concepts. One cannot in the classical sense picture 
a system being partly in each of two states and see the equivalence of this to 
the system being completely in some other state. There is an entirely new idea 
involved, to which one must get accustomed and in terms of which one must 
proceed to build up an exact mathematical theory, without having any detailed 
classical picture. 

When a state is formed by the superposition of two other states, it will 
have properties that are in some vague way intermediate between those of 
the two original states and that approach more or less closely to those of 
either of them according to the greater or less ‘weight’ attached to this state 
in the superposition process. The new state is completely defined by the two 
original states when their relative weights in the superposition process are known, 
together with a certain phase difference, the exact meaning of weights and phases 
being provided in the general case by the mathematical theory. In the case of 
the polarization of a photon their meaning is that provided by classical optics, 
so that, for example, when two perpendicularly plane polarized states are 
superposed with equal weights, the new state may be circularly polarized in 
either direction, or linearly polarized at an angle 47, or else elliptically polarized, 
according to the phase difference. 

The non-classical nature of the superposition process is brought out clearly 
if we consider the superposition of two states, A and B, such that there exists 
an observation which, when made on the system in state A, is certain to lead 
to one particular result, a say, and when made on the system in state B is 
certain to lead to some different result, b say. What will be the result of 
the observation when made on the system in the superposed state? The answer 


superpostion of 
states 


wave mechanics 


12 


is that the result will be sometimes a and sometimes b, according to a probability 
law depending on the relative weights of A and B in the superposition process. 
It will never be different from both a and b. The intermediate character of the state 
formed by superposition thus expresses itself through the probability of a particular 
result for an observation being intermediate between the corresponding probabilities 
for the original states’ not through the result itself being intermediate between 
the corresponding results for the original states. 

In this way we see that such a drastic departure from ordinary ideas as 
the assumption of superposition relationships between the states is possible 
only on account of the recognition of the importance of the disturbance 
accompanying an observation and of the consequent indeterminacy in the result 
of the observation. When an observation is made on any atomic system that is in 
a given state, in general the result will not be determinate, i.e., if the experiment 
is repeated several times under identical conditions several different results may 
be obtained. It is a law of nature, though, that if the experiment is repeated 
a large number of times, each particular result will be obtained in a definite 
fraction of the total number of times, so that there is a definite probability of 
its being obtained. This probability is what the theory sets out to calculate. 
Only in special cases when the probability for some result is unity is the result of 
the experiment determinate. 

The assumption of superposition relationships between the states leads to 
a mathematical theory in which the equations that define a state are linear in 
the unknowns. In consequence of this, people have tried to establish analogies 
with systems in classical mechanics, such as vibrating strings or membranes, 
which are governed by linear equations and for which, therefore, a superposition 
principle holds. Such analogies have led to the name ‘Wave Mechanics’ being 
sometimes given to quantum mechanics. It is important to remember, however, 
that the superposition that occurs in quantum mechanics is of an essentially 
different nature from any occurring in the classical theory, as is shown by the fact 
that the quantum superposition principle demands indeterminacy in the results 
of observations in order to be capable of a sensible physical interpretation. 
The analogies are thus liable to be misleading. 


5. Mathematical Formulation of the Principle 


Let us consider the whole set of states of a particular dynamical system. 
They will form an aggregate of things between which there will exist a number 


’The probability of a particular result for the state formed by superposition is not always 
intermediate between those for the original states in the general case when those for the original 
states are not zero or unity, so there are restrictions on the ‘intermediateness’ of a state formed 
by superposition. 


5. Mathematical Formulation of the Principle 13 


of relationships of a special kind, arising from the principle of superposition. 
These relationships we must now formulate in exact mathematical language. 

The superposition process is a kind of additive process and implies that states 
can in some way be added to give new states. Now any mathematical quantities 
which can be added to give new quantities of the same nature may be represented 
by vectors in a suitable vector space with a sufficiently large number of dimensions. 
We are thus led to represent the states of a system by vectors in a certain 
vector space. The vectors will be assumed to radiate from a common origin. 

We represent each state by a vector denoted by a symbol w. Different states are 
represented by different vectors w, which may be distinguished by being provided 
with different suffixes; thus the states A, B, C may be represented by wW4, Wz, Wo. 
If the state A can be formed from the states B and C, then we assume that 
the corresponding vectors wW4, we, Wc are connected by an equation of the type 


Wa = <IBVBpt+ Loc, (1) 


where xg and xc are numbers. 

From this assumption certain precise properties of the superposition 
process follow—properties which are in fact necessary for the word ‘superposition’ 
to be suitable. Since, when vectors are added, the order in which they are 
put is unimportant, it follows that when two or more states are superposed, 
the order in which they occur in the superposition process is unimportant. 
The superposition process is symmetrical between the states that are superposed. 
Further we see from equation (1) that (excluding the case when the coefficient 
Lp or Lc is zero) if the state A can be formed by superposition of the states B 
and C’, then the state B can be formed by superposition of C' and A, and C' can be 
formed by superposition of A and B. The superposition relationship is symmetrical 
between all three states A, B and C. Three states that are symmetrically related 
in this way will be called dependent. More generally, any set of states A, B,..., Z 
will be called dependent if there exists a relation between their representative 
vectors of the form 

LaA~aAt+ cave t+---+2z~z = 0, (2) 


where the coefficients 74, 7p, ..., Xz are not all zero; otherwise they would be 
called independent. 

If we obtain the maximum number of independent states, this will give us 
the number of dimensions of our vector space. In most practical examples 
this number is infinite. The vector is useful dispite of this, most of the reasoning 
that we use it for being equally applicable whether the number of dimensions is 
finite or infinite. 

To proceed with the accurate formulation of the superposition principle we must 
introduce a further assumption, namely the assumption that by superposing a state 


dependent states 


independent states 


14 


with itself we cannot form any new state, but only the original state over again. 
If the original state is represented by the vector ~, when it is superposed with 
itself the result will be represented by 


rip + Lop = (41 + £2) y, 


where x, and x2 are numbers. Now we may have x; + x2 = O, in which 
case the result of the superposition process would be nothing at all, the two 
components having cancelled each other out by an interference effect. Our new 
assumption requires that, apart from this special case the resulting state must be 
the same as the original one, so that (x; + x2)~ must represent the same state 
that ~ does. Now 21+ 22 is an arbitrary number and hence we can conclude 
that if the representative vector of a state is multiplied by any number, not zero, 
the resulting vector will represent the same state. Thus a state is specified by 
the direction of a vector in the vector space and any length one may assign to 
the vector is irrelevant. All the states of the dynamical system are in one-one 
correspondence with all the possible directions for a vector in the vector space, 
when one makes no distinction between the directions of the vectors w and —w. 

The new assumption above shows up very clearly the fundamental difference 
between the superposition of the quantum theory and any kind of classical 
superposition. In the case of a classical system for which a superposition 
principle holds, for instance a vibrating membrane, when one superposes 
a state with itself the result is a different state, with a different magnitude 
of the oscillations. There is no physical characteristic of a quantum state 
corresponding to the magnitude of the classical oscillations, as distinct from 
their quality, described by the ratios of the amplitudes at different points of 
the membrane. Again, while there exists a classical state with zero amplitude 
of oscillation everywhere, namely the state of rest, there does not exist any 
corresponding state for a quantum system, the zero ket vector corresponding to 
no state at all. 

One further assumption is necessary to complete the mathematical 
formulation of the principle of superposition. This is the assumption that 
in an equation expressing a superposition relationship, such as equation (1) 
or (2), the coefficients x can be complex numbers, and in the statement 
‘if the representative vector of a state is multiplied by any number, not zero, 
the resulting vector will represent the same state’, the multiplying number can 
be complex. 

The need for the allowing of complex coefficients can be seen in the two 
examples discussed in §§2 and 3, in which it is clear that from the superposition of 
two given states a twofold infinity of states may be obtained. In fact in the example 
of §2, there are just two independent states of polarization for a photon, which may 


15 


be taken to be the states of linear polarization parallel and perpendicular to 
some fixed direction, and from the superposition of these two a twofold infinity 
of states of polarization can be obtained, namely all the states of elliptic 
polarization, the general one of which requires two parameters to describe it. 
Again, in the example of §3, from the superposition of two given states of 
motion for a photon a twofold infinity of states of motion may be obtained, 
the general one of which is described by two parameters, which may be taken 
to be the ratio of the amplitudes of the two wave functions that are added 
together and their phase relationship. Now if, in the superposition equation (1), 
the coefficients xg & xo were restricted to be real numbers, then, since only 
their ratio is of importance for determining the direction of the resultant vector w4 
when Wg and wo are given, there would be only a simple infinity of states 
obtainable from the superposition. The allowing of complex coefficients increases 
this to a twofold infinity. 

Our assumption of complex coefficients implies that in every case of 
superposition of two different given states, a twofold infinity of states may 
be obtained. The vectors w representing the states are complex vectors, there being 
a twofold infinity of them with extremities on any given line in the vector space. 


6. Analysis of the Principle 


The principle of superposition that we have been discussing, applying to the states 
of any atomic system, is in agreement with the restricted principle of relativity, 
as it involves no reference to any particular Lorentz frame of reference. It would 
be desirable to develop the whole of quantum mechanics relativistically but at 
the present time? this is not practicable, since relativistic quantum mechanics 
has as yet only a very limited applicability. There exists at present a general 
and logical scheme of non-relativistic quantum mechanics, yielding results in 
agreement with experiment, but, although one can obtain a formal extension 
of the scheme satisfying relativity requirements, this extension is not applicable 
to practical problems except with the help of approximations that are not 
mathematically justifiable. 

The greater part of the present book will be concerned with the non-relativistic 
quantum mechanics, which is now as precise and as general as classical mechanics, 
to which it has, in fact, a strong analogy. The work will thus refer to one absolute 
time. the theory then naturally divides itself into two parts, part (i) dealing with 
relations and laws of nature governing the state of affairs in an atomic system at 
one instant of time, and part (ii) dealing with the connexion between the state 
of affairs at one instant of time and at a slightly later instant. Part (ii) will 


#11935] 


state 


16 


contain the analogue of the equations of motion of classical mechanics and will, 
in fact, be a neat mathematical generalization of that scheme of equations. Part (i) 
will give essentially the theory of the limitations of one’s powers of observation 
of a small system and its classical analogue will consist mainly of trivialities, 
since classical theory assumes there are no such limitations. A certain section 
of part (i), though (dealt with in Chapter V), will have a non-trivial classical 
analogue, concerned with the important dynamical notions of conjugate variables, 
contact transformations and related things. 

Historically, part (ii) was the first to be discovered. People guessed at 
the quantum generalization of the classical equations of motion and then proceeded 
to work with the quantum equations of motion, only gradually learning their proper 
physical significance and the limitations which they require in the possibilities of 
observation. In a logical exposition of the quantum theory, though, part (i) should 
be put first. This will accordingly be done in the present book, part (i) being 
dealt with in Chapters II and V and the equations of motion being introduced 
in Chapter VI. 

With the recognition of the natural separation of the theory into two parts, 
(i) and (ii), it becomes desirable to use the word ‘state’ in a rather different 
sense from that in which we have been using it up to the present. As we have 
been using it and as it comes in in the general formulation of the principle 
of superposition, it refers to the condition of the dynamical system throughout 
all time—something which, in the classical theory, would be described by a set of 
functions of the time which satisfy certain equations of motion. The preferable 
sense in which to use the word ‘state’ is to make it refer to the condition of 
the dynamical system at one instant of time—something which, in the classical 
theory, would be described merely by a set of numerical values for the dynamical 
variables. With the old meaning for the word, a dynamical system remains 
permanently in one state and just follows out the course of motion in that state; 
with the new meaning a dynamical system is at each instant of time in a definite 
state and is continually changing from one state to another (or, as we may say, 
the state that the dynamical system is in is continually changing) under 
the influence of the equations of motion. 

The old meaning is probably the more fundamental from an abstract 
theoretical point of view, since it is relativistic, referring to conditions 
throughout space-time, while the new meaning is non-relativistic, referring 
to conditions in a three-dimensional section of space-time belonging to 
one time-instant. The new meaning, though, is better adapted to the line of 
development of the theory that we shall follow. It allows us to say that part (i) deals 
with the relations between the possible ‘states’ in which a dynamical system may 
be at any instant of time, and part (ii) deals with the connexion between the ‘state’ 


6. Analysis of the Principle 17 


at one instant and that at a slightly later instant. The new meaning will therefore 
be used throughout the book! except in a few places where otherwise stated. 

If we now examine the general principle of superposition, applying to 
the ‘states’ of a system in the old sense, from our new non-relativistic point of view, 
we see that this principle resolves itself into two distinct hypotheses. One of these is 
a principle of superposition applying to the ‘states’ in the new sense. Between such 
states there must exist superposition relationships of just the same character as 
those between the old kind of states. The whole of §5 will apply equally well to 
the new kind of states. The other hypothesis is that, if we take certain states at 
one instant of time that are connected by some superposition relationship, so that 
their representative vectors satisfy an equation of the type (1) or (2), then in 
the course of time these states will change in such a way that they always remain 
connected by this superposition relationship, their representative vectors w varying 
in such a way that equation (1) or (2) continues to hold, with constant coefficients. 
This second hypothesis turns the assumption of superposition relationships 
between the states at one instant of time into an assumption of superposition 
relationships holding between the various possible motions throughout all time, 
as required by the general principle of superposition. 

The two hypotheses into which we have analysed the general principle of 
superposition belong respectively to parts (i) and (ii) of our theory. The principle 
of superposition of states in the new sense is one of the fundamental assumptions 
on which part (i) of the theory will be built, while the hypothesis of the constancy 
throughout time of any superposition relationship provides the basic assumption 
in the derivation of the equations of motion and the setting up of part (ii). 


tThis is an alteration from the first edition, where the old meaning was used throughout. 


Il. STATES AND OBSERVABLES 


7. The Vector Space representing the States 


DURING the twentieth? century a profound change has taken place in the opinions 
physicists have held on the foundations of their subject. Previously they supposed 
that the principles of Newtonian mechanics would provide the basis for 
the description of the whole of physical phenomena and that all the theoretical 
physicist had to do was suitably to develop and apply these principles. 
With the recognition that there is no logical reason why Newtonian and other 
classical principles should be valid outside the domains in which they have been 
experimentally verified has come the modern point of view that departures from 
these principles are indeed necessary. Such departures find their expression 
through the introduction of new mathematical formalisms, new schemes of axioms 
and rules of manipulation, into the methods of theoretical physics. 

Quantum mechanics provides a good example of the new ideas. It requires 
the states of a dynamical system and the observations that can be made on 
the system to be interconnected in ways that appear strange and unfamiliar 
from the classical standpoint. This results in the states and observations 
being represented by mathematical quantities of different natures from those 
ordinarily used. The new scheme becomes a precise physical theory when all 
the axioms and rules of manipulation governing the mathematical quantities are 
specified and when in addition certain laws are laid down connecting physical 
facts with the mathematical formalism, so that from any given physical conditions 
equations between the mathematical quantities may be inferred and vice versa. 
In an application of the theory one would be given certain physical information, 
which one would proceed to express by equations between the mathematical 
quantities. One would then deduce new equations with the help of the axioms 
and rules of manipulation and would conclude by interpreting these new equations 
as physical conditions. The justification for the whole scheme depends, apart from 
internal consistency, on the agreement of the final results with experiment. 

The present chapter will be concerned with the foundation of the scheme in so 
far as it applies to the states of a dynamical system at one particular time and 
observations made on the system at that time. We begin with the idea introduced 


t[‘present’ replaced by ‘twentieth’ 


18 


7. The Vector Space representing the States 19 


in $5 of representing each state by the direction of a vector in a certain vector space. 
(We saw in §6 that this idea is valid for the states at a particular time.) It is 
now necessary to discuss the geometrical nature of the vector space—in particular 
the possibility of the existence of relations of perpendicularity between the vectors. 

A convenient way of describing the geometrical nature of the vector space is 
by introducing a coordinate system of the simplest type possible and discussing 
the transformations of coordinates arising from the passage to other coordinate 
systems that are equally simple. Let the coordinates of a vector w, be the set of 
numbers aj, dz, a3,.... These numbers must in general be complex, since, as we saw 
in §5, we can multiply the vectors by complex numerical coefficients and then add 
them to other vectors. If we make a passage to a new coordinate system, in which 
the coordinates of the vector ~, are aj, a5, a3,..., then the new coordinates will 
be connected with the old ones by linear relations of the type 


ae (1) 


where the 7,, are numbers which depend only on the two coordinate systems and 
not on the vector wg. 

We now make the assumption that the y,, may be and in general are complex 
numbers, even when the two coordinate systems are both of the simplest type 
possible. The effect of this is that if the coordinates of w, are real in one coordinate 
system they will in general be complex in the other. Thus one can give no invariant 
meaning to the vector w, being real. One cannot have a real vector in the vector 
space and one cannot split up a general vector into real and imaginary parts. 

Consider now the conjugate complex numbers to the coordinates of wg. 
These conjugate complex numbers will also transform according to a linear law, 


namely the law 
a, = aor (2) 


where the bar over a number denotes its conjugate complex. They may thus 
be considered as the coordinates of a vector in some vector space. It will be 
a different vector space from that of the w’s though, since the transformation 
law (2) is different from (1), on account of the transformation coefficients 7,., 
being in general different from the y,;. There will be no meaning for the sum 
of one of the vectors in the new vector space with one of the w’s in the original 
vector space. The two vector spaces will not, of course, be entirely disconnected 
but must be related in a special way, since each transformation of coordinates in 
one of them is associated with a definite transformation of coordinates in the other. 


3[‘pure’ omitted] 


bar notation 


conjugate 
imaginary 


conjugate complex 


20 II. STATES AND OBSERVABLES 


We shall call the vectors in the new vector space ¢’s. That one of them whose 
coordinates are the conjugate complex numbers of the coordinates of a w with any 
specified suffix will be denoted by @ with the same suffix. Thus ¢, is the vector 
whose coordinates are the conjugate complex numbers to those of w,. Two vectors 
such as ¢, and wW,, whose coordinates are conjugate complex numbers, we shall 
define to be conjugate imaginary vectors. We use the words ‘conjugate imaginary’ 
instead of ‘conjugate complex’, since the relation between ¢, and wW, is not quite 
the same as the relation between a pair of ordinary conjugate complex numbers, 
on account of its! being possible neither to add together ¢, and w, nor to split 
up ¢q and qq into real and'! imaginary parts. The words ‘conjugate complex’ and 
the notation of putting a bar over a quantity to get its conjugate complex will 
be reserved for quantities which can be split up into real and! imaginary parts. 
Thus we shall speak of the conjugate imaginary of a vector ¢, but of the conjugate 
complex of the coordinates of this vector in any specified coordinate system, 
the coordinates being just ordinary numbers. 

Each vector w, in the space of w’s determines uniquely a vector ¢, in the space 
of ¢’s and vice versa. Thus the space of ¢’s provides a representation of the states 
of our dynamical system just as well as the space of ¢’s, each state being associated 
with one direction in the space of @’s. There is, in fact, perfect symmetry between 
the @’s and w’s, which symmetry will survive all through the theory. 

We now introduce a further and final geometrical property of the space of w’s. 
We assume that, if aj, ao, a3,... and b,, be, b3,... are the coordinates of any 
two vectors ~, and vy» referred to one of the simplest coordinate systems, then, 
in the passage to any other of the simplest coordinate systems, the coordinates 
will transform in such a way that the number 


Gb, + Gobo + G3b3 +--- (3) 


remains invariant, This assumption imposes certain conditions onthe coefficients 
Yrs in (1). The number (3) may be regarded as the scalar product of the vector 
up» with the vector a, the conjugate imaginary of ~,, and may be denoted by 
the symbolic product ¢ay». There is no invariant of the type a,b; +.a2b2+a3b3+--- 
or @b, + Gb2 + G3b3 + ---, corresponding to the scalar product of Ww, with vw, or 
of dg with ¢@, and thus symbolic products of the type ww, or dadp never occur in 
the theory. 

An invariant of the type (3) is not unusual in pure mathematics. It forms 
an interesting generalization to the case of complex coordinates of the ordinary 
scalar product a,b; + dgbo + a3b3 +--- of two vectors with real coordinates a1, ao, 
a3,... and by, by, b3,... in ordinary Euclidean space. The invariant (3) introduces 


4[‘not’ replaced by ‘neither... nor...’| 
II[‘pure’ omitted] 


7. The Vector Space representing the States 21 


a closer and more familiar connexion between the w’s and the @’s. Instead of 
picturing the w’s and @’s as vectors in two different vector spaces, we may picture 
them as two different kinds of vector associated with the same space. The relation 
between these two kinds of vector is then just the one well known in differential 
geometry as the relation between covariant and contravariant vectors. 

The symbolic product notation ¢,W, is very convenient for general discussions 
and will be extensively used in this book. When using it, we shall make 
the convention always to put the @¢-symbol to the left of the w-symbol, 
since the notation then fits in very well with the matrix notation that will be 
developed later. As before remarked, products like way» and @,¢p never occur. 

From the definition (3) it follows at once that the symbolic product ¢,v», 
is subject to the usual algebraic axioms for the product of two quantities, 
as exemplified by the following equations: 


Pa (we + We) = Pao T Par; 
(da ate bv) We = PaWe Tr PoWes 
and PalkY) = (koa), = k(bavr) 


where & is any number. Further results that follow immediately from the definition 
are that the two numbers ¢,W» and @pw, are conjugate complex, 


das = bya, (4) 


and the number @,W, is always real and positive except in the special case when 
the vector w, vanishes. This number ¢@,W, may be called the square of the length 
of the vector w, or of the vector ¢@,, in agreement with the meaning of length for 
an ordinary vector with real coordinates. 

It will frequently happen in the course of development of the theory and also in 
its applications that we shall have to introduce a vector w or @ whose direction is 
fixed by special considerations referring to the problem in hand, but whose length 
is not so fixed. It is then often convenient to choose the length to be equal to unity. 
This procedure is called normalization and the vector so chosen that its length is 
unity is said to be normalized. It should be noted that the vector is not, even then, 
completely determined, since one can always multiply it by any number of modulus 
unity, i.e. of the type e’° where c is real, without changing either its direction or 
its length. We call such a number a phase factor. 

If a ¢-vector and a w-vector are such that their product @w is zero, we shall 
say that these two vectors are orthogonal. We shall also say that two w’s are 
orthogonal if the product of either with the conjugate imaginary of the other 
is zero. Thus uw, and yw, are orthogonal if ¢,v,) = 0 or if dyvq = 0, these two 
conditions being, of course, equivalent on account of (4). A similar definition will 
hold for the orthogonality of two @’s. Further, we shall say that two states are 
orthogonal if the vectors representing these states are orthogonal. 


length of a vector 


normalization 


phase factor 


orthogonal 


observable 


matrix 
element of a matrix 


22 II. STATES AND OBSERVABLES 


8. Observables as Linear Operators 


The preceding section completes all that can be said about the relationships 
between the states of a system at a particular time. To continue with 
the development of the theory we must introduce observations into the discussion. 
We shall be concerned here only with observations made at this same 
particular time. Each such observation consists in the measurement of the value 
at this time of some dynamical coordinate or momentum, or some function of 
the coordinates and momenta. It will be convenient to introduce a special word 
for these things that get measured, as they play such an important part in 
the theory. We shall call each of them an observable. Thus an observation consists 
in the measurement of an observable. 

In the present section we shall deal only with the general relations which exist 
between observables and which connect observables with states. The discussion 
of the measurements of observables and the way in which the numerical results of 
such measurements appear in the theory will be left to the next section. 

We make the fundamental assumption that each observable is represented in 
the mathematical formalism by a linear operator that can operate on the w-vectors. 
By a linear operator is meant an operator which, operating on any w-vector, 
changes that w-vector into another w-vector whose coordinates are linear functions 
of the coordinates of the first one. Thus, when it operates on the vector w, with 
coordinates x, (r = 1, 2, 3,...), it will change that vector into some vector, ~, say, 
whose coordinates b, are connected with the x, by relations of the type 


b, = Sy rvs, (5) 


where the a, are numbers (in general complex), which depend only on the operator 
and not on the vector wz. 

The numbers a,, may be called the coordinates of the operator. They differ 
from the coordinates of a vector in that there are many more of them, each of 
them requiring two suffixes to label it instead of one. If we had to write them out 
explicitly, the natural way of arranging them would be as a two-dimensional array, 
thus: 

Qi1 AyQ2 43 
Qo1 22 93 
O31 32 33 


Such an array is called a matrix and the numbers are called the elements of 
the matrix. We make the convention that the elements must always be arranged 
so that those in the same row have the same first suffix and those in the same 


8. Observables as Linear Operators 23 


column have the same second suffix. An element whose two suffixes are the same, 
such as a,, is called a diagonal element, as all such elements lie on a diagonal of 
the array. 

There is a symbolic notation which can conveniently be used in connexion with 
linear operators, corresponding to that which we had for the scalar product in 
the preceding section. The linear operator with coordinates a, we call the linear 
operator a, and when it acts on any vector w,, the resulting vector is regarded 
as the symbolic product of a with w, and is written aw,, with the operator to 
the left of the ~-symbol. Thus when the coordinates of two vectors Ww, and Wy» 
and of a linear operator a are connected by equation (5), the relation between 
the vectors and linear operator may be expressed by the symbolic equation 


Wo = Ae. (6) 


It is easily seen that for any two vectors q, and wy, 


a(Wx =F Wy) = AW + AWy, 


and for any number k, 

alkws) = kate). 
These equations, in fact, are just those that express in the symbolic notation 
the linearity property of the operator a. 

The assumption that observables are represented by linear operators seems at 
first sight to be a very drastic and unexpected departure from ordinary ideas. 
It appears much more reasonable, though, when one examines the properties of 
linear operators and sees how well fitted they are to play the part of observables. 
In the mathematics of classical mechanics, observables are quantities which we can 
add to one another or multiply with one another or form algebraic functions of, 
the results of these processes being other observables. Now the theory of linear 
operators can be developed so that we can add and multiply linear operators and 
form algebraic functions of them, the results of these processes being other linear 
operators. Thus linear operators can be handled mathematically in much the same 
way in which one is used to handling observables in classical mechanics. 

The sum of two linear operators is defined as that linear operator which, 
operating on any vector W,, changes this vector into the sum of the two vectors 
into which w, is changed by the two operators individually. Thus, in the symbolic 
notation, the sum a+ ( of two linear operators a and £ is defined by the equation 


holding for all w,. Similarly, the product of a linear operator a with a number 
k is defined as that linear operator which, operating on any vector Wz, 


diagonal element 


commute 


24 II. STATES AND OBSERVABLES 


changes this vector into k times the vector into which w~, is changed by a. 
In the symbolic notation, 


(ka)ps = kar). 


With the help of these two definitions one can form linear functions, with arbitrary 
numerical coefficients, of the linear operators. 

The product of two linear operators is defined as that linear operator 
which produces by itself the same effect as the two operators in succession. 
Thus the product a( is defined as the operator which, operating on a vector Wz, 
changes it into that vector which one would get by operating first on w, with £, 
and then on the result of the first operation with a. In symbols, 


(aB) be = a(Bypz). 


In general this would not be the same as operating first with a and then with £, 
so that the product a in general differs from Ga. The commutative axiom of 
multiplication does not hold for linear operators. It may happen in a special case 
that two linear operators a and £ are such that a@ and Ga are equal. In this case 
we say that a commutes with (3, or that a and 6 commute. 

It is easily seen that the other multiplication axioms of ordinary algebra, 
the associative and distributive axioms, as well as all the addition axioms, are valid 
for linear operators. Thus one can build up an algebra for linear operators which 
runs very similar to ordinary algebra. For instance, by repeated applications of 
the processes of addition and multiplication one can construct functions of linear 
operators, in fact all those functions that can be expressed as power series. 

When we say that observables are represented by linear operators, it is implied 
that the algebraic relations which exist between any observables are the same as 
the algebraic relations between the linear operators representing those observables. 
Thus the essential mathematical significance of the assumption that observables 
are represented by linear operators is that observables are subject to an algebra 
which is the same as ordinary algebra with the exception that the commutative 
axiom of multiplication does not hold. 

Up to the present we have considered observables only in connexion with 
w-vectors. To maintain symmetry between the w’s and the @’s, we must have 
the possibility of representing an observable by a linear operator operating 
on the ¢-vectors. This possibility can be deduced from the assumption that 
observables are represented by linear operators operating on the w-vectors, 
together with the relations between @’s and w’s established in the preceding section. 

Let us form the scalar product of the ~-vector aw,, with an arbitrary 
g-vector dy. If the coordinates of ¢, are the numbers ¥,, then, the coordinates of 


8. Observables as Linear Operators 25 
aw, being given by the right-hand side of (5), the scalar product has the value 
by(ave) = Soo, S- Ans 5. 


This may be written in the form 


S° Asks, (7) 


where as = So FpQirs. (8) 


Now the @, must be such that expression (7) is invariant under a change of 
coordinate system when the x, are the coordinates of any w-vector. This is 
sufficient to ensure that the @, transform like the coordinates of a ¢-vector and 
thus that the @, are the coordinates of some ¢-vector. We call this vector ¢,a 
and consider it as the symbolic product of the vector ¢, with an operator a. 
It is, in fact, the result of some linear operator operating on @,, since, as shown 
by equation (8), its coordinates are linear functions of the coordinates of dy. 
This linear operator provides an alternative representation for the observable 
symmetrical with the previous linear operator operating on w-vectors. 

The use of the same letter a for both operators makes a convenient notation 
which does not lead to confusion. In fact, the two operators may be counted as 
just one operator which can operate either to the right on a w-vector or to the left 
on a @-vector. We now have a symbolic scheme in which the following associative 
law of multiplication holds, 


by(ave) = (dy) Pr. 


Either of these quantities will be written in future as ¢,aw, without brackets. 

A number, considered as a multiplying factor into each w or each @, is a special 
case of a linear operator. It has the property of commuting with every linear 
operator. One can easily see that any linear operator that commutes with every 
linear operator is a number. 

One further question remains to be considered in this section. We are assuming 
that every observable can be represented by a linear operator. Does every 
linear operator represent some observable? One would immediately expect some 
restriction on the linear operator of the nature of a condition of reality, since, 
owing to the fact that a linear operator may be multiplied by an arbitrary complex 
number and remains a linear operator, the general linear operator must correspond 
to a complex function of the dynamical variables. Such a complex function may, 
of course, be considered formally as a complex observable, but since no meaning 


Hermitian 


26 II. STATES AND OBSERVABLES 


can be attached to the measurement of a complex observable,* it is preferable 
to restrict the word ‘observable’ to refer to real functions of dynamical variables 
and to introduce a corresponding restriction on the linear operators that represent 
observables. 
We assume this restriction to be that the coordinates a,, of each of these linear 
operators satisfy the relations 
Ors = Ons, (9) 


These are just the relations required for the matrix formed by the a,, to be what 
is called a Hermitian matrix. A linear operator that satisfies this condition may 
conveniently be called a Hermitian operator. 

In order to see that this assumption is suitable, we must verify that 
the condition imposed by (9) is independent of the system of coordinates. An easy 
way of doing this is by putting this condition in a form that does not refer 
to any coordinate system. Introducing two arbitrary ~-vectors, w, and ¢, say, 
with coordinates x, and y,, we have 


Pray = S- ErOrsYs 
r,s 


and PyhYy = y UsQets = y Ly Cer Ug: 
r,s r,s 


From (9) we now obtain 


Pry = Pye. (10) 


Conversely, if we are given (10) for arbitrary w, and w,, we can deduce (9), 
by taking w, and w, to be the unit vectors along the directions of the r-th and 
s-th axes respectively. Thus the condition on a imposed by (9) is equivalent to that 
imposed by (10), and since (10) contains no reference to any coordinate system (9) 
must also be independent of the coordinate system. 

As a corollary to this work we see, by putting aw, = wy, in (10), 


Pray = PyWo = Poy, 
from (4). Since this holds for arbitrary w,, we must have 


Pe = Dp. (11) 


It would not do to measure separately the real and’ imaginary parts, because this prould 
mean two measurements, which in general would interfere with one another. 
3[Spure’ omitted] 


2 


Thus the w-vector aw, has as its conjugate imaginary the ¢-vector ¢,a. This result 
will be much used in the following work. It is true only on account of the operator a 
being Hermitian. 

The question now remains—Does every Hermitian operator represent 
an observable? The answer to this is that, provided we give a sufficiently 
comprehensive meaning to the word observable, to make it include all real functions 
of the dynamical variables that are theoretically measurable and not merely 
those for which a practicable method of measurement can be set up, most of 
the Hermitian operators ordinarily met with do represent observables, but there 
are exceptions. The remaining condition for a Hermitian operator to represent 
an observable will be given at the end of §10. 


9. Eigenvalues 


In the two preceding sections we made a number of assumptions about the way in 
which states and observables are to be represented mathematically in the theory. 
These assumptions are not, by themselves, laws of nature, but become laws 
of nature when we make some further assumptions that provide a physical 
interpretation of the theory. Such further assumptions must take the form of 
establishing connexions between physical facts, on the one hand, and the equations 
of the mathematical formalism on the other. 

One of these further assumptions is the following: In the special case when 
the result of a particular observation made on the system in a particular state 1s 
with certainty one particular number, a say (instead of being one of two or more 
numbers according to a probability law), then the Hermitian operator, a say, 
representing the observable that is measured and the w-vector, Wq say, representing 
the state are connected by the equation 


AW, = Ag. (12) 


Conversely, if this equation holds, a measurement of the observable represented by 
a made on the system in the state represented by wW, is certain to lead to the result a. 

Equation (12) means that the linear operator a, applied to the vector w., 
just multiplies this vector by a numerical factor without changing its direction 
(or possibly multiplies it by the factor zero, so that it ceases to have a definite 
direction). This same a applied to other vectors will, of course, in general change 
both their lengths and their directions. It should be noticed that only the direction 
of wa, is of importance in equation (12). If one multiplies w,, by any number 
not zero, it will not affect the question of whether w, satisfies equation (12) 
or not. This, of course, is necessary in order that our assumption may be sensible, 
since the state represented by, depends only on the direction of, and not on 
its length. 


28 II. STATES AND OBSERVABLES 


There are some other matters which we must look into before we can be 
sure that our assumption is reasonable. One of these concerns the reality of 
the number a. Any result of a measurement is necessarily a real number. Is any 
number a satisfying an equation of the type (12) also necessarily real? We can 
easily see that it is so when we make use of the Hermitian property of a. 
Multiplying (12) symbolically by ¢,, the conjugate imaginary of ~,, on the left, 
we get 


Pat Wa = abaWa- 


Now ¢,Qjq is a real number, as follows from equation (10) with w, and y, both 
put equal to ¢._, and further we saw, just after equation (4), that @.v. must be real. 
Hence a must be a real number. 

Another point to be noticed is that our assumption does not disturb 
the symmetry between ~’s and ¢’s. We can, in fact, replace equation (12) in 
the assumption by 


daa = Ada, (13) 


equations (12) and (13) being equivalent since they are just conjugate imaginary 
equations, according to the rule deduced in the preceding section in connexion 
with equation (11). 

A further question to be looked into is the following. If we have any 
observable, we can multiply it by any real number k and get another observable. 
Now if a measurement of the original observable with the system in a particular 
state is certain to lead to one particular result a, we should require for physical 
consistency that a measurement of the new observable with the system in this same 
state shall certainly lead to the result ka. Is this given by the mathematical 
formalism, with the help of our assumption? It is easily seen that it is. 
If the original observable is represented by the operator a, the new one must be 
represented by ka. The condition that a measurement of the original observable 
shall certainly lead to the result a when the system is in the state represented 
by wa is equation (12) and from this equation we can deduce 


(ka)e = kata, 


from which we can infer that a measurement of the new observable will certainly 
lead to the result ka for this same state. 

The above question is a special case of a more general one. We may take 
as new observable any function f of the original one, instead of just k times it, 
and we should then require for physical consistency that a measurement of the new 
observable shall certainly lead to the result f(a). This also is deducible from 
the mathematical formalism in an elementary way, provided the function f is 
expressible as a power series. The general case, where the function f may not 


9. Eigenvalues 29 


be expressible as a power series, will be dealt with in 811, where the requirement 
we are now discussing will be used as a basis for a general mathematical definition 
of a function of an observable. 

Equation (12) is of such fundamental importance in the theory that it is 
desirable to introduce some special words to describe the relationships between 
the quantities involved. We shall call a an eigenvalue? of the operator a or of 
the observable that a represents and w, an eigen-w of this operator or observable, 
and we shall say that the eigen-~ w, belongs to the eigenvalue a. Likewise, 
gq satisfying (13) is an eigen-@ belonging to the eigenvalue a, and the state 
represented by either Ww, or @q is an eigenstate belonging to this eigenvalue. 
This terminology may also be used when the linear operator a is not Hermitian 
and does not represent an observable. 

Our assumption now enables us to infer that every eigenvalue of an observable 
is a possible result of the measurement of that observable. It is certainly the result 
when the system is in an eigenstate belonging to the eigenvalue. The converse 
theorem, that every possible result of the measurement of an observable, 
with the system in any state whatever, as one of its eigenvalues, is also true and 
will be deduced in §12 from a more general assumption for physical interpretation. 
The set of eigenvalues of an observable are just the possible results of measurements 
of that observable and the calculation of eigenvalues is thus an important practical 
problem. 

A real number k is a special case of a Hermitian operator. Its peculiar 
characteristic from our present standpoint is that it has just one eigenvalue, 
namely k, and every w-vector is an eigen-w belonging to this eigenvalue. 
Such an operator may be considered to represent an observable. Any measurement 
of the observable must then always lead to the same result, namely k, no matter 
what state the system is in. This means that the observable is a natural constant, 
such as the velocity of light or the charge of an electron, or perhaps just a number. 

The theorem will now be proved that two eigenstates belonging to two different 
eigenvalues of an observable are orthogonal, in the sense defined at the end 
of §7. Suppose the two eigenstates are represented by the eigen-w’s 1, and WW, 
the corresponding eigenvalues being a, and ag respectively. Then, if a represents 
the observable, we have 


ayy, = ayy, (14) 
and goa = argo, (15) 


tThe word ‘proper’ is sometimes used instead of ‘eigen’, but this is not satisfactory as 
the words ‘proper’ and ‘improper’ are so often used with other meanings. See, for example, 
§20 and also p. 174. 


eigen 


belonging 
eigenvalue 


to 


an 


orthogonality 
theorem 


30 II. STATES AND OBSERVABLES 


the second of these equations being of the type (13). Multiplying (14) by ¢2 on 
the left-hand side and (15) by y on the right-hand side, we obtain 


P2aY1 = a,O2Y1 
and Pay = ArPqW 
respectively. Hence ado = aredev, 


so that if a, is not equal to a2, d2¥, = 0 and the two states are orthogonal. 
We shall call this theorem the orthogonality theorem. 

If ~, and we are two eigen-w’s belonging to the same eigenvalue, then it is 
easily seen that any linear combination of them, c,~ 1 + cove, is also an eigen-w 
belonging to this eigenvalue. Physically, this means that if we take two states such 
that a measurement of some observable with the system in either of these states 
is certain to lead to one particular result, then a measurement of this observable 
with the system in any state formed by the superposition of the two states will also 
certainly lead to this same result. This gives us some understanding of the physical 
meaning of superposition. 

It can easily be proved that no linear combination of eigen-w’s belonging to 
different eigenvalues can be an eigen-w, i.e., that ezgen-w’s belonging to different 
eigenvalues are all necessarily independent. If this were not so, we should have 


a relation of the type 
S° CrPp = 0, (16) 


with numerical coefficients c,, between a number of eigen-w’s belonging to different 
eigenvalues. In this relation we can assume, without loss of generality, that there 
is only one term corresponding to any eigenvalue, since if there were several 
terms corresponding to the same eigenvalue, these terms could be lumped together 
to form a single term, which would still be an eigen-~. If we now multiply (16) on 
the left by €,¢,, the conjugate imaginary of one of the terms, we get, on account 
of the orthogonality theorem proved above, 


CsPsCsWs = 0, 
which gives Ca, = 0; 


Hence each term in (16) must vanish separately. 


10. The Expansion Theorem 


In the preceding section we discussed the eigenvalues and eigen-~’s of an observable 
or Hermitian operator, assuming that these eigenvalues and eigen-w’s exist. 
A question that we did not consider is whether, if we take some particular 


10. The Expansion Theorem 31 


observable or Hermitian operator, it will have any eigenvalues and eigen-w’s at all, 
and if so how many. 

This question can easily be answered in the case of a dynamical system that 
has only a finite number of independent states, so that the space of ~-vectors 
has only a finite number of dimensions.’ If we introduce a system of coordinates 
in this space, the condition (12) for an eigenvalue and eigen-w of a Hermitian 
operator a becomes 


SS Opete = Ge, (17) 


where the a,, are the coordinates of the linear operator and the x, are those 
of the eigen-w. Let us suppose that the Hermitian operator a is given, so that 
the a,; are given, and the x, and a are unknowns which we must try to choose 
to satisfy the conditions (17). We then have, if n is the number of dimensions 
of the vector space, n equations for the n + 1 unknowns 2z,, a. Written in full, 
these equations are 


(aq — a)@1 + Qy2%2 + 013%3 +++++ Aintn = 0, 
Aig 21 + (Qo2 — a)X2 + A303 +++ + Qontn = 0, 
31L1 + Agzo%o + (agg — a)X3 +++: + Agn%p = O, (18) 


Anil + An2¥%2 + An3%3 + +++ + (Ann — 4)Ln = 0. 


The n variables x, occur in these equations linearly and homogeneously and we may 
eliminate them, obtaining the determinantal equation for a, 


Q11— a 12 Q13 : : : Qin 
Q21 Q22 — a 23 : : : Qn 
31 32 Q33 —a . : . A3n = 0). (19) 
QAnl An2 Qn3 * ; » Ann — a 


This is an algebraic equation of the n-th degree in a and must have n roots, 
not necessarily all different. Each of these roots is an eigenvalue and the eigen-w 
belonging to it may be obtained from (18). When two or more of the roots 
of (19) coincide at some particular value, a, say, then the eigenvalue a; must 
have a number of independent eigen-w’s belonging to it equal to the number 
of these coincident roots. This result may be proved by algebraic methods, 
but one can also see that it must be true from elementary considerations of 


'The problem is then essentially the same as that of the transformation of a quadratic form 
to principal axes. See Courant and Hilbert, Methoden der mathematischen Physik, Chapter I. 


32 II. STATES AND OBSERVABLES 


continuity. Suppose small variations, of the order of magnitude of €, to be made in 
the matrix elements a; in such a way as not to destroy the Hermitian character of 
the matrix and so as to separate all the roots, m in number, say, that previously 
coincided at a,. These roots will then differ from one another and from a, by 
quantities of the order of «. Each of them will have some eigen-~ belonging to it and 
these eigen-w’s will be all orthogonal to one another, by the orthogonality theorem 
of the preceding section. These eigen-w’s will define an m-dimensional sub-space, 
which contains them all. Any ~ in this sub-space (with a length of the order unity) 
will satisfy the condition for being an eigen-w of the original Hermitian operator a 
belonging to the eigenvalue a;, with an error of the order «. By now making € — 0, 
we obtain in the limit an m-dimensional sub-space of w’s and thus m independent 
w’s, each of which is an eigen-w of a belonging to the eigenvalue aj. 

It should be noted that the argument makes use of the Hermitian property of 
q@ in using the orthogonality theorem and nowhere else. It is necessary to use 
this theorem, since otherwise some of the eigen-w’s of the varied Hermitian 
operator, belonging to eigenvalues near a,, might be inclined to each other at 
angles of the order € and tend to coincidence as € — 0, in which case the argument 
would fail. A simple example of such failure is obtained if we take the non- 


Hermitian matrix 
0 1 
0 Oo; 


We now have the result that, when the number of dimensions of the w-space 
is finite and equal to n, the number of independent eigen-w’s of any Hermitian 
operator is also n. Hence an arbitrary ~ can be expressed linearly in terms of 


these eigen-w’s, thus 
w = Wa, (20) 


where w is arbitrary and each w, is an eigen-~. This is the expansion theorem for 
the case when the number of dimensions of the ~-space is finite. 

We must now go over to the case of an infinite number of dimensions. 
The expansion theorem still read—an arbitrary w can be expressed linearly in 
terms of the eigen-w’s of a Hermitian operator, but the theorem can no longer 
always be written in the form (20), since the number of independent eigen-w’s 
may be more than enumerable. This happens when the eigenvalues include all 
numbers in a certain range, say all numbers from p to qg. (It may quite possibly 
be from —oo to +00.) The expansion may then take the form of an integral, 


v= [vada (21) 


10. The Expansion Theorem 33 


where a is the eigenvalue to which w, belongs and w, varies with a in such a way 
that the integral exists. This form of expansion, though, does not include all cases. 
We shall take up this question again in §20. 

A rigorous proof of the expansion theorem, sufficiently general to cover all 
the cases for which it is required in quantum theory, has not yet been found, 
[to the date of publication in 1935]. The following argument, however, 
makes the theorem appear plausible. 

Let a be the Hermitian operator and consider a w-vector w, that is a function 
of the parameter 7 and satisfies the differential equation 


oie = toy (22) 
If w, is given for one value of 7, then it is fixed by this equation for a slightly 
greater value of 7. Thus we should expect this equation to have one solution, 
and only one, for any given initial value for w,, i.e., for w, equal to an arbitrary wo 
when T = 0. Suppose now that this solution can be expressed as a Fourier series 
or integral in 7; thus, if we take for definiteness the case of the integral, 


r= / ely, da, (23) 


where w, is independent of 7, but involves the new parameter a. Substituting 
this expression for w, in (22), we obtain 


[ ac da Sia jew, dda; 


or [oct da = [era da. 


Since this equation holds for all values of 7, we can equate coefficients of e’“”, 
which gives 


AWg = Ag. 


Thus w, is an eigen-w of a belonging to the eigenvalue a. If we now put 7 = 0 


in (23), we obtain 
Wo = |» da, 


which gives an expansion for the arbitrary Wo in terms of the eigen-w’s w, in 
the form (21). If w, were expressible as a Fourier series instead of the Fourier 
integral (23) we should get an expansion of wp as a sum of eigen-w’s in the form (20). 

The weak point in the above argument is the assumption of a Fourier expansion 
for w,. One case of failure would arise if the length of the vector w, increased 


observable 


34 II. STATES AND OBSERVABLES 


to infinity as T — oo, but this possibility can be ruled out with the help of 
the Hermitian condition for a, which condition we have not yet used. Let ¢, be 
the conjugate imaginary of ~,, satisfying the conjugate imaginary equation to (22), 
which is 

d ; 

aot = —10,, 
according to the theorem deduced in §8 in connexion with equation (11). Then! 


d d T d T 
© (br) = bt + Sey, 
= ¢,(iaw,) — (ida), = 0. (24) 


dt dt 

Thus the vector w, remains of constant length and cannot increase to infinity. 

The argument is not rigorous even yet, since the vector w, might behave in other 
odd ways which would make its Fourier expansion impossible. In fact the expansion 
theorem is not true for every Hermitian operator, although it is true for most of 
the Hermitian operators met with in quantum mechanics. This results in lack of 
rigour in the theory from the mathematical standpoint, since we shall continually 
be requiring to use the expansion theorem in cases where it has not been proved. 
The situation is not so bad though, because there are usually physical grounds for 
telling when an application of the expansion theorem is permissible. The consistent 
development and physical interpretation of quantum mechanics require us to make 
the assumption that only those Hermitian operators that satisfy the expansion 
theorem represent observables and thus that an arbitrary state is dependent on 
the eigenstates of any observable. Hence, if we know that a certain Hermitian 
operator represents some dynamical quantity which can be observed (for example 
if it represents the energy of some system) we may use the expansion theorem for 
this operator without fear of getting into error. Those linear operators appearing 
in the theory which do not represent observables will still represent functions of 
the dynamical variables, though these functions will not be directly observable. 


11. Functions of an Observable 


With the help of the algebraic operations of addition and multiplication we can 
give a meaning to those functions of linear operators that are expressible as power 
series and thus to the corresponding functions of observables. We can, however, 
get a much more general definition of a function of an observable by following out 
a suggestion mentioned in 89. 

Suppose we have any observable, represented by the Hermitian operator a, 
and let w, be one of its eigen-w’s, belonging to the eigenvalue a, so that 


Ae = aa. (25) 


tT? replaced by brackets] 


11. Functions of an Observable 35 


A measurement of the observable when the system is in the state represented 
by w, must certainly lead to the result a. We now require that a measurement of 
a function f of the observable, when the system is in this same state, shall certainly 
lead to the result f(a), f being any function such that f(a) has a meaning and 
is real. Thus we should expect the function of the observable to be represented by 
some linear operator f(a) that satisfies 


fla)Wa = fla)a. (26) 


We take the condition that (26) always holds when (25) holds as 
the mathematical definition of f(a). It is easily seen that this definition is 
self-consistent, when applied to a set of eigen-w’s of a which are not independent, 
since such eigen-7’s must all belong to the same eigenvalue. Thus, if we take for 
example three such eigen-w’s, Wia, W2a, and 34, connected by the relation 


Via = Wa a Wa; 


the definition would give us the same result if we operate on wW, with f(a) as 
if we operate on Wo, and w3, and add. Also, the definition completely fixes 
the linear operator f(a), since it allows us to obtain the result of f(a) operating on 
an arbitrary w-vector. We have only toexpand the arbitrary ~-vector in terms of 
eigen-w’s of a, which expansion must be possible since a represents an observable, 
and then to operate with f(a) on each term separately in the expansion.® 

We must now verify that the above-defined linear operator f(a) can represent 
an observable. Evidently f(a) satisfies the expansion theorem, since, as we see 
from (25) and (26), every eigen-q of a is an eigen-w of f(a). It therefore only 
remains for us to verify the Hermitian condition. We can do this most conveniently 
by taking the form (10) for the Hermitian condition. Expanding the arbitrary ¢, 
and ¢, in terms of eigen-d’s and eigen-7)’s of a respectively,1 we get equations of 


the form 
Qe = Gees Wy = Seid 


’We are here, for definiteness, taking the case when the expansion is in the form of a sum, 
as in (20), and not an integral, as in (21). We shall be continually doing this in the rest of 
this chapter and the next one. The change from sums to integrals involves only formal alterations 
in the theory, which alterations will be dealt with in Chapter IV. 

1An expansion in terms of eigen-¢’s must, of course, be possible whenever the corresponding 
expansion for eigen-w’s is possible. In these expansions we may assume, without loss of generality, 
that there is only one term corresponding to any eigenvalue, since if there were more than one, 
they could be lumped together to form a single term. 


36 II. STATES AND OBSERVABLES 


where @¢zq iS an eigen-@ belonging to the eigenvalue a and Yy,q' is an eigen-q 
belonging to the eigenvalue a’. Hence, with the help of (26), 


bef (a)vy = S- Oval (a) S° Wya! = » one S- fa Wal 
= S- f(a) dbzabya' = ye f (4) bszatya; (27) 


the orthogonality theorem being used in the last step. Interchanging the suffixes x 
and y, we get 


dbyf(a)ve = Ss" f (@)byaWza- 
Now f(a) is, by hypothesis, a real number and from (4) 
PraPya _ PyaVna 


so that bef (a) py = byf (a) ve. 


Hence f(a) satisfies the condition that a satisfies in (10) and must therefore 
be Hermitian. 

We can now assume that the linear operator f(a) represents that observable 
which is the function f of the observable represented by a. In this way we can give 
a meaning to any real function f of an observable, provided only that the domain 
of existence of the function of a real variable f(x) includes all the eigenvalues 
of the observable. If the domain of existence contains other points besides 
these eigenvalues, then the values of f(x) for these other points will not affect 
the function of the observable. The function need not be analytic or continuous. 
The eigenvalues of a function f of an observable are just the function f of 
the eigenvalues of the observable. 

It is important to remember that the possibility of defining a function f of 
an observable requires the existence of a unique number f(x) for each value of x 
which is an eigenvalue of the observable. Thus the function must be single-valued 
and the function idea which we use corresponds to the one in the theory of functions 
of a real variable, rather than the one in the theory of functions of a complex 
variable. This may be illustrated by considering the question: When we have 
an observable f(A) which is a function of the observable A, is the observable A 
a function of the observable f(A)? The answer to this is yes, if different 
eigenvalues a of A always correspond to different values of f(a). If, however, 
there exist two different eigenvalues of A, a, and a2 say, such that f(a ,) = f(a2), 
then, corresponding to the eigenvalue f(a) of the observable f(A), there will not 
be a unique eigenvalue of the observable A and the latter will not be a function of 
the observable f(A). 


11. Functions of an Observable 37 


It may easily be verified mathematically, from the definition, that the sum or 
product of two functions of an observable is a function of that observable and that 
a function of a function of an observable is a function of that observable. Also it is 
easily seen that the whole theory is symmetrical between @¢’s and w’s and that 
we could equally well work from the equations 


Pat = aa 
sa baf(a) = ne oe 


instead of from (25) and (26). 

We shall conclude this section with a discussion of two examples which 
are of great practical importance, namely the reciprocal and the square root. 
The reciprocal of an observable exists when the observable does not have 
the eigenvalue zero. If the observable is represented by the Hermitian 
operator a, the reciprocal observable will be represented by a Hermitian operator, 
which we call a7! or 1/a, satisfying 


a, = aa, (29) 
where w, is an eigen-w of a belonging to the eigenvalue a. Hence 
aa, = aa Wa = Wa. 


Since this holds for any eigen-w wW, and an arbitrary w can be expanded in terms 
of these eigen-w’s, we must have 


aa! =1. (30) 
Similarly, a ta =1. (31) 


Either of these equations is sufficient to determine a~' completely, provided a does 
not have the eigenvalue zero. To prove this in the case of (30), take the equation 


of =1 


and multiply both sides on the left by the a~! defined by (29). The result is 


a taé=ae! 


and hence from (31) f=a77 


Equations (30) and (31) can be used to define the reciprocal, when it exists, 
of a general linear operator a, which need not be Hermitian and represent 


reciprocal 
observable 


of an 


square root of an 
observable 


positive square 
root 


38 II. STATES AND OBSERVABLES 


an observable. One of these equations by itself is then not necessarily sufficient. 
If any two linear operators a and £@ have reciprocals, their product af has 
the reciprocal 

(ap)~* = Ba, (32) 
obtained by taking the reciprocal of each factor and reversing their order. 
We verify (32) by noting that its right-hand side gives unity when multiplied 
by aG, either on the right or on the left. This reciprocal law for products can be 
immediately extended to more than two factors, i.e., 


(eGa i \ aay Bae 


The square root of an observable exists when the observable has no negative 
eigenvalues. If the observable is represented by the Hermitian operator a, 
the square root observable will be represented by a Hermitian operator, 
which we call ,/a or a, satisfying 


Jaws, = £/arba, (33) 


Ww, being an eigen-w of a belonging to the eigenvalue a. Hence 


Ja/ate = Salata = atta = atha, 


and since this holds for any eigen-w w,, we must have 


JaVa=a. (34) 


On account of the ambiguity of sign in (33) there will be several square-root 
observables. To fix one of them we must specify a particular sign in (33) for 
each eigenvalue. This sign may vary irregularly from one eigenvalue to the next 
and equation (33) will always define a Hermitian operator \/a satisfying (34) 
and representing an observable which can legitimately be called a square root of 
our original observable. If there is an eigenvalue of a with two or more independent 
eigen-w’s belonging to it, then we must, according to our definition of a function, 
have the same sign in (33) for each of these eigen-w’s. If we had different signs, 
however, equation (34) would still hold, and hence equation (34) by itself is 
not sufficient to define \/a, except in the special case when there is only one 
independent eigen-~ of a belonging to any eigenvalue. 

The number of different square roots of any observable which has no negative 
eigenvalues is 2”, where n is the total number of eigenvalues (or 2” — 1 if one of 
the eigenvalues is zero). The square root mostly used in practice is the one for 
which the positive sign is always taken in (33). This one will be called the positive 
square root. 


See, for example, the e and e~™ of §36, equation (58) 


39 


12. The General Physical Interpretation 


The assumption that we introduced in 89 to get a physical interpretation of our 
mathematics is of a rather special kind, since it can be used only in connexion 
with an equation of the special type (12). We need some more general assumption 
which will enable us to extract physical information from our mathematics even 
when we have no equation of the type (12). 

In classical mechanics an observable always, as we say, ‘has a value’ for any 
particular state of the system. What is there in quantum mechanics corresponding 
to this? If we take any observable, represented by the Hermitian operator a say, 
and any two states, represented by the vectors ¢, and w, say, then we can form 
the number ¢,awW,. This number is not very closely analogous to the value 
which an observable can ‘have’ in the classical theory, for three reasons, namely, 
(i) it refers to two states of the system, while the classical value always refers 
to one, (ii) it is in general not a real number, and (iii) it is not uniquely determined 
by the observable and the states, since the vectors ¢, and 7, contain arbitrary 
numerical factors. Even if we impose on ¢, and w, the condition that they 
shall be normalized, there will still be an undetermined factor of modulus unity 
in d,aWy. These three reasons cease to apply, however, if we take the two states 
to be identical. The number that we then get, namely ¢,aw,, is necessarily real, 
as may be seen from equation (10) with the suffix y replaced by x. Also it is 
uniquely determined, with the help of the conditions that ¢, and w, are conjugate 
imaginary vectors and both normalized, since if we multiply ¢, by the numerical 
factor e’°, c being some real number, we must multiply =, by e~* and ¢,aw, will 
be unaltered. 

One might thus be inclined to make the tentative assumption that 
the observable represented by a ‘has the value’ ¢,aw, for the state represented 
by ¢, or w, in a sense analogous to the classical sense. This would not be 
satisfactory, though, for the following reason. Let us take a second observable, 
represented by the Hermitian operator 6, and thus by the above assumption having 
the value ¢,6wv, for this same state. We should expect, from classical analogy, 
that, for the same state again, the sum of the two observables would have a value 
equal to the sum of the values of the two observables separately and the product 
of the two observables would have a value equal to the product of the values of 
the two observables separately. Actually, the tentative assumption would give for 
the sum of the two observables the value ¢,(a@ + 3)w,, which is, in fact, equal to 
the sum of ¢,aw, and ¢,6v, but for the product it would give the value ¢,a6W, 
or d; Bay, neither of which is connected in any simple way with ¢,aw, and Wz,Cdy. 

However, since things go wrong only with the product and not with the sum, 
it would be reasonable to call ¢,aw, the average value of the observable represented 


observable 
a value 


observable 


having 


having 


by a for the state represented by ¢, or w,. This is because the average of an average value 


probability of an 


observable 
a value 


having 


40 II. STATES AND OBSERVABLES 


the sum of two quantities must equal the sum of their averages, but the average 
of their product need not equal the product of their averages. We therefore make 
the general assumption that if the measurement of the observable represented by a, 
for the system in the state represented by Wz, is made a large number of times, 
the average of all the results obtained will be @,awW,, provided d, and yw, are 
normalized. This assumption provides a general method for physical interpretation 
of the mathematics. We shall see a little later that the assumption of §9 is deducible 
from this one. 

The expression that an observable ‘has a particular value’ for a particular state 
is permissible in quantum mechanics in the special case when a measurement 
of the observable is certain to lead to the particular value, so that an equation 
of the type (12) holds. It may easily be verified from the algebra that, 
with this restricted meaning for an observable ‘having a value’, if two observables 
have values for a particular state, then for this same state the sum of the two 
observables (if this sum is an observable!) has a value equal to the sum of 
the values of the two observables separately and the product of the two observables 
(if this product is an observable*) has a value equal to the product of the values 
of the two observables separately. 

In the general case we cannot speak of an observable having a value for 
a particular state, but we can speak of its having an average value for the state. 
We can go further and speak of the probability of its having any specified value 
for the state, meaning the probability of this specified value being obtained when 
one makes a measurement of the observable. This probability can be calculated 
from the general assumption for physical interpretation in the following way. 

Take any observable, represented by the Hermitian operator a, and any state, 
represented by the normalized w-vector w,. Then the average value of 
the observable for the state will be ¢,aw,. More generally, the average value 
of any function f of the observable will be ¢,f(a)v,. This provides us 
with sufficient information to calculate the probability of the observable having 
any specified value. Suppose we expand w, in terms of eigen-w’s of a, thus 


v= ay (35) 


where Wzq is an eigen-w of a belonging to the eigenvalue a. Then, by the same 
analysis as led to (27) with the suffix y replaced by x, we obtain 


bef (a) Vx = » f(a) bzaPza- 


tThis is not obviously so, since the sum of the Hermitian operators representing the two 
observables may perhaps not satisfy the expansion theorem. 
tHere the Hermitian condition may fail, as well as the expansion theorem. 


12. The General Physical Interpretation Al 


Now if P(a) is the probability of the result a being obtained from a measurement 
of the observable, the average value of the function f of the observable must be 
>=, f(a)P(a), from the ordinary rules of probability, the summation being over all 
values a which are possible results of the measurement. Hence 


S~ f(a) P(a) = Sy f(a) bara. 


This equation holds for an arbitrary function f, so that f(a) can be an arbitrary 
number for each value of a. Hence we can equate coefficients of f(a), which gives 


P(a) = $catbza- (36) 


Thus the probability of the observable having any value a is equal to the square 
of the length of the corresponding eigen-~ in the expansion (35). If a is not 
an eigenvalue, there will be no eigen-a corresponding to it in the expansion (35) 
and the probability must be zero. This proves the theorem, stated without proof 
in 89, that every possible result of the measurement of an observable is one of 
its eigenvalues. It is easily confirmed that the expression (36) for P(a) gives unity 
for the total probability of the observable having as value any one of its eigenvalues. 
From the condition that 7, is normalized, we get’ 


1 osx = > bea >, Waa’ = S- beatPea = SPs 


a a’ a 


making use of the expansion (35) and its conjugate imaginary, and also of 
the orthogonality theorem. 

We can now see that the assumption for physical interpretation made in 89 is 
deducible from the one made in the present section. Let us apply the formula (36), 
which was obtained entirely from the physical interpretation of the present section, 
without the help of that of 89, to the case of a state which is an eigenstate of 
the observable we are interested in. Then ~w, will be an eigen-w and the expansion 
on the right-hand side of (35) will contain only one term. This term will be 
normalized, so the square of its length will be unity. Formula (36) now tells us 
that the probability of the observable having any given value is unity if this value 
is the eigenvalue to which the state belongs and zero otherwise. This is just 
the converse of the initial assumption of 89. The assumption itself can be deduced 
by a reversal of the argument. 

We have been all the time taking the case when the expansion (35) is in the form 
of a sum and not an integral, and supposing, to agree with this, that the possible 
results of a measurement of the observable form a discrete set of numbers and not 


‘a primed appropriately. | 


42 II. STATES AND OBSERVABLES 


a continuous range. The case of integrals and continuous ranges will be dealt with 
in Chapter IV. 

In practical applications of quantum mechanics it is nearly always more 
convenient to obtain the physical interpretation of the mathematics from 
formula (36) or something equivalent, instead of from a direct application of 
the expression for the average value of an observable. 


13. Commutability and Compatibility 


A state may be simultaneously an eigenstate of two observables. If the state is 
represented by the w-vector w and the observables are represented by the Hermitian 
operators a@ and 6, we should then have the equations 


any = arp, 
by = by, 


where a and b are numbers. We can now deduce 


aby = abp = aby) = bap = Bay, 
or (ab — Ba) = 0. 


This suggests that the chances for the existence of a simultaneous eigenstate are 
most favourable if a3 — Ga = 0 and the two observables commute. If they do not 
commute a simultaneous eigenstate is not impossible, but is rather exceptional. 
On the other hand, if they do commute there exist so many simultaneous 
eigenstates that, as will now be proved, an arbitrary state is dependent on them. 
We thus get a generalization of the expansion theorem of §10. 

Let a and £8 be the Hermitian operators representing any two commuting 
observables. Take any eigen-w of a, say the eigen-w w,, belonging to 
the eigenvalue a, and expand it in terms of eigen-w’s of 6, thus 


Wa = So ve; (37) 


where 7 is an eigen-~ of 6 belonging to the eigenvalue b. This expansion must be 
possible from §10. From the equation 


Qa = 0 
we get Sila — a)yy = 0. (38) 


Now av, is an eigen-w of @ belonging to the eigenvalue b, since 


Bla) = aByy = aby, = b(ayr). 


13. Commutability and Compatibility 43 


Hence (a — a)» is also an eigen-w of 8 belonging to the eigenvalue b. Thus every 
term in the sum in (38) is an eigen-w of 6 and each belongs to a different eigenvalue, 
since each term in the sum in (37) may be assumed to correspond to a different 
eigenvalue. Now from a theorem of §9, eigen-2’s belonging to different eigenvalues 
are necessarily independent. It follows that every term in (38) vanishes separately. 
Thus 
(a — a)yy =0 

and each yy is an eigen-w of a belonging to the eigenvalue a as well as 
being an eigen-w of 6. Equation (37) therefore gives w, expanded in terms of 
simultaneous eigen-w’s of a and 3. Since any w can be expanded in terms of w,’s, 
it follows that any w can be expanded in terms of simultaneous eigen-w’s of a 
and £3. 

The converse theorem, which says that two observables must commute 
if an arbitrary w can be expanded in terms of their simultaneous eigen-w’s, 
is also true. To prove it, let a and 6 be the Hermitian operators representing 
the two observables and let wa, be one of their simultaneous eigen-w’s belonging 
to the eigenvalues a and b. We then have 


(aB — Ba)pa = (ab — ba), = 0. 
Hence (ab — Ba)y =0, 


where w is any w-symbol that can be expanded in terms of the ,,’s. If this is true 
for an arbitrary w, we can infer that 


aB — Ba =0, 


as required. 

The idea of simultaneous eigen-w’s may obviously be extended to more than 
two observables and the theorem proved above still holds, i.e., an arbitrary w 
can be expanded in terms of the simultaneous eigen-w’s of any set of observables 
that commute, and also its converse. The same arguments used for the proof 
in the case of two observables are adequate for the general case; e.g., if we have 
three observables, represented by the Hermitian operators a, 3, y, that commute, 
each with the other two, we can expand any simultaneous eigen-w of a and 6 in 
terms of eigen-w’s of y and then show that each of these eigen-w’s of y is also 
an eigen-w of a and (. 

Two simultaneous eigen-w~’s must be orthogonal if the sets of eigenvalues 
to which they belong differ in any way. 

Owing to the validity of the expansion theorem for two or more commuting 
observables, we can set up a theory of functions of two or more commuting 


tSee second footnote on p. 35. 


compatible 
observations 


44 II. STATES AND OBSERVABLES 


observables, on the same lines as the theory of functions of a single observable 
given in $11. If the commuting observables are represented by the Hermitian 
operators, a, 3, y,..., we define a general function f of them to be that observable 
represented by the Hermitian operator f(a, 6,7,...) which satisfies 


f(a, B, Yooee abe: = F(a, b, Cy... aie (39) 


where Wap... is any simultaneous eigen-w of a, 3, y,... belonging to the eigenvalues 
a, b, c,.... Here f is any function such that f(a,b,c,...) is defined for all values 
of a, b, c,... which are eigenvalues of a, 6, y,... respectively. The linear operator 
f(a,G,7,...) is completely defined by (39), since we can obtain the result of 
its operating on an arbitrary ~ by expanding this w in terms of the simultaneous 
eigen-w’S Wape,.,. and operating on each term in the expansion separately. 

We can now proceed to generalize the result (36). Suppose the normalized 
vector Wz representing any state to be expanded in terms of simultaneous eigen-w’s 


of a, 2, y,.--, thus 
Vr = es Ujabess (40) 


abe... 
Working from this equation instead of (35), we obtain, by an analogous argument 
to that which led to (36), that the probability for this state of the results a, b, c,... 
being obtained when measurements are made of the observables represented by a, 


B, Yo.-+ 1S P(a, b, Come ; = O tibens apes (41) 


We can now conclude, in the first place, that we can give a meaning to 
the probability of definite results being obtained for simultaneous measurements 
of several commuting observables. ‘This is not a trivial conclusion. In general 
one cannot make an observation on a system in a definite state without disturbing 
that state and spoiling it for the purposes of a second observation. One cannot 
then give any meaning at all to the two observations being made simultaneously. 
The above conclusion tells us, though, that in the special case when the two 
observables commute, the observations are to be considered as non-interfering or 
compatible, in such a way that one can give a meaning to the two observations 
being made simultaneously and can discuss the probability of any particular results 
being obtained. The two observations may, in fact, be considered as a single 
observation of a more complicated type, the result of which is expressible by 
two numbers instead of a single number. From the point of view of general 
theory, any two or more commuting observables may be counted as a single 
observable, the result of a measurement of which consists of two or more numbers. 
The states for which this measurement is certain to lead to one particular result 
are the simultaneous eigenstates. 

The numerical value of the probability (41) is very important for applications 
of quantum mechanics. 


Ill. REPRESENTATION THEORY 
FOR DISCRETE EIGENVALUES 


14. The Bracket Notation 


THE preceding chapter dealt with the fundamental laws governing states and 
observables in quantum mechanics and included all the axioms of the underlying 
mathematical formalism as well as the assumptions for physical interpretation of 
the mathematics. The present chapter and the following one will be concerned, 
not with making new laws and assumptions, but with systematizing and developing 
ideas and methods already introduced, and generally with arranging the theory in 
a form fitted for the subsequent applications. One matter that we must deal with 
is the setting up of a suitable notation for coordinates—a notation which can be 
consistently followed all through the future very extensive use of coordinates and 
is at the same time as simple and as easily remembered as possible. 

In order to define a system of coordinates we must specify a set of w’s with 
the following properties. (i) They are all orthogonal to each other. (ii) Each of 
them is normalized. (iii) There are so many of them that an arbitrary 7 is 
dependent on them, so that if the space has a finite number of dimensions there 
must be the same number of these w’s. Such a set of w’s will be called a set 
of basic w’s for a coordinate system. The coordinates of any w will then be its 
coefficients when expanded in terms of the basic w’s. We shall denote a coordinate 
associated with a basic w, w, by the bracket expression (r|). Thus 


v= So dr(r))- 


We put the coordinates (r|) to the right of their corresponding w,’s, in order 
to conform to a certain helpful style of writing, which will be developed as 
we go along. If we want to denote the coordinates of some particular w, specified 
by a suffix, x say, we put this suffix in the space to the right of the vertical line, 
so that the coordinate associated with the basic w, wv, is (r|z). Thus 


We a S > d-(r |x). (1) 


The notation implies some kind of symmetry in the way a coordinate (r|x) depends 
on r and on x. We shall see in §17 that there is such a symmetry. 


45 


basic w 


bracket notation 


46 III. REPRESENTATION THEORY FOR DISCRETE EIGENVALUES 


The conjugate imaginaries of a set of basic w’s will be a set of basic @’s, defining 
a system of coordinates in the ¢-space, and the coordinates of an arbitrary @ will be 
its coefficients when expanded in terms of the basic @’s. We denote the coordinate 
associated with the basic ¢, ¢, by the bracket expression (|r). Thus 


= o(In)er, 


our style of writing now requiring the coordinates to be put on the left. 
The corresponding coordinate of a particular ¢, , is denoted by (z|r), so that 


On) anor: (2) 


r 


The conditions of orthogonality and normalization which the basic w’s and @’s 
have to satisfy may all be expressed by the equation 


brs = Ons (3) 


where the symbol 6,,, which we shall often use in the future, has the meaning 


Sng =O ened 


=1 whenr=s. 


(4) 


If we multiply equation (1) by ¢, on the left, we get, after first changing 
the dummy suffix r on the right-hand side into s, 


OrWr ar > brs(s|a) 
= (r|x), (5) 


with the help of (3). This gives us an explicit expression for the coordinate (r|zx) 
of w,. A similar explicit expression for the coordinate (z|r) of ¢, may be obtained 
by multiplying (2) by ~, on the right, the result being 


bey = (xr). (6) 
Other formulas which obviously hold are 
(r|x) = (2|r) (7) 


and ay = > (alr) (rly). (8) 


is 


14. The Bracket Notation 47 


The coordinates of a linear operator a (Hermitian or not), which we previously 
denoted by a;.;, will now be denoted by (rja|s). These coordinates are defined 
by equations (6) and (5) of the preceding chapter, which become, in our 
present notation 

Wo = ane 
(rb) = )“(rlals)(s\2). 9) 


If the linear operator is considered to operate to the left on @-vectors, 
the corresponding equations are 


Pa = ya 
(als) = S (afr) (rlals). (10) 


r 


By putting J, in (9) equal to ws, one of the basic w’s, with its s-th coordinate 
unity and all the others zero, we get 


(r|b) = (rlals). 


Thus (rlja|s) is the r-th coordinate of aw,, or, from (1), 
ad, = So v,(rlals), (11) 


This equation may be taken as an alternative definition of the coordinates (rla|s). 
The corresponding equation in terms of @’s, which may be derived by putting 
dy = br in (10), is 
dra = Pi (rlals) ds. (12) 
By applying the formula (5) with aw, for w,, we obtain an explicit expression 
for the r-th coordinate of aw, (rla|s), namely the expression 


(rlals) = d-avs. (13) 


We could alternatively have obtained an explicit expression for (rla|s) by 
considering it to be a coordinate of ¢,a, as in (12), and using the formula (6). 
The result would have been the same. Our method of introducing a meaning for a 
as an operator to the left on ¢-vectors was chosen so as to secure this agreement. 
We have seen already that multiplication by a number, k say, is a special case of 
a linear operator. It may easily be deduced from (11) or (13) that the coordinates 
of this operator are 
(Flkils)= Kobe: (14) 


unit matrix 


representation 


representative 


48 III. REPRESENTATION THEORY FOR DISCRETE EIGENVALUES 


All the coordinates vanish except those on the diagonal, i.e., those for which r = s, 
and the latter all have the value k. The identical operator, i.e., multiplication by 
unity, has for its coordinates just the numbers 6,,, forming a matrix which is called 
the unit matrix. 

The general assumption of 812 gives us a physical meaning for a diagonal 
coordinate (rja|r) of any linear operator a that represents an observable, 
namely (rla|r) is the average value of the observable represented by a for the basic 
state represented by w,.. 

A system of coordinates for w- and ¢-vectors and linear operators will be called 
in future a representation. The coordinates of any w- or ¢-vector or linear operator 
will be called the representative of that quantity and will be said to represent that 
quantity. They may also be called the representative of the corresponding state 
or observable. 


15. Matrix Multiplication 


Suppose we are given the representatives of two linear operators a and 6. What will 
be the representative of their product a6? Using the formula (11) twice, we get 


aBdr =a S~ o.(s] Blt) 
= S~ ¥,(rla|s)(s/Blt). (15) 


But from this same formula we also have 


abr = (a8) = S-d,(rlaplt). (16) 


Equating coefficients of ~, in (15) and (16), we obtain 


(rlas|t) = d_(rlals)(s|6It). (17) 


s 


which gives us the representative of a in terms of those of a and £. 

If the representatives of our linear operators are regarded as forming matrices, 
equation (17) gives us the matrix law of multiplication, well known in pure 
mathematics. The element in the r-th row and t-th column of the product matrix 
is the sum of the product of each element in the r-th row of the first factor matrix 
with the corresponding element in the t-th column of the second factor matrix. 

The second of equations (9) may also be regarded as an example of matrix 
multiplication. For this purpose the representative of any w-vector must be 
regarded as forming a matrix with just one column. The product of such a matrix 


15. Matrix Multiplication 49 


with a square matrix, the square matrix being on the left, is again a matrix 
with a single column. Equations (9) now show that the single-column matrix 
representing aw, is equal to the product of the square matrix representing a with 
the single-column matrix representing w,. In a corresponding way the second 
of equations (10) becomes an example of matrix multiplication if we regard 
the representative of any @-vector as forming a matrix with a single row. 
Finally, equation (8) gives yet another example of matrix multiplication, 
since its right-hand side may be regarded as a product of the single-row matrix 
representing @, with the single-column matrix representing W,, which product is, 
by the matrix law of multiplication, a matrix with one row and one column, 
i.e., an ordinary number. 

The foregoing multiplication rules can be immediately extended to products of 
more than two factors, whether the factors are linear operators or w- or d-vectors. 
In every case the representative of the product is connected with those of the factors 
by the matrix law of multiplication. In consequence of this it is evident that 
the associative law of multiplication holds generally with all our symbols for linear 
operators and vectors, and that, for example, 


Gx(OB) Py = (bro) (By). 


In fact, all the laws of ordinary algebra hold with the exception of the commutative 
law of multiplication. 

The rules of our style of writing have by now become fairly clear. 
When a summation is made over any variable, this variable occurs in two 
consecutive positions, on the extreme right of one factor and on the extreme left 
of the next following factor. The consistent use of this style makes it extremely 
easy to remember formulas such as (1), (2), (8), (9), (10), (17) and many others 
which will come later. 

We define the conjugate complex @ of any linear operator a by 


(r|als) = (slalr), (18) 


i.e., the matrix representing @ is obtained from that representing a by 
the interchanging of rows and columns and the taking of the conjugate complex 
of each element. This rule, it should be noticed, is formally the same as that 
connecting the single-row matrix representing any @ with the single-column matrix 
representing the conjugate imaginary ~. We use the words ‘conjugate complex’ 
and not ‘conjugate imaginary’ when speaking of linear operators, because a linear 
operator and its conjugate complex are quantities of the same nature, which can 
be added together, and one can give a meaning to real and? imaginary linear 


‘pure’ omitted. 


conjugate complex 
of a linear operator 


50 III. REPRESENTATION THEORY FOR DISCRETE EIGENVALUES 


operators. A real linear operator, i.e., one equal to its conjugate complex, is, 
as we see from (18) with @ = a, just what we called in §8 a Hermitian operator. 
The general linear operator corresponds to a complex function of the dynamical 
variables and a real or Hermitian linear operator corresponds to a real function 
of the dynamical variables, which may be an observable. (It is an observable 
if it satisfies the expansion theorem.) 

The conjugate complex of a product a/ of two linear operators may be obtained 
in the following way, from the formulas (18) and (17) 


(rlaB|s) = (slaglr) = >) (slalé)@B)r) 


= S-(r/Blt)(¢lals) = (r/Ba|s). 


Hence ap = Ba. (19) 


Thus to take the conjugate complex of a product of two linear operators, we must 
take the conjugate complex of each factor and reverse their order. The same rule 
applies to a product of three or more linear operators, as may be deduced by 
repeated applications of the rule for two linear operators, thus 


apy = a(By) = bya = 78a. 


The rule can easily be generalized still further to read—the conjugate complex 
or conjugate imaginary of any product of linear operators and w- and @-vectors 
is obtained by taking the conjugate complex or conjugate imaginary of each factor 
and reversing their order. The proof of the more general rule follows at once from 
the similarity of (7) and (18). This similarity allows us to infer, for instance by 
the same argument as led to (19), that the conjugate imaginary of aw, is $,q. 
We have already had examples of the general rule in the preceding chapter in 
equations (4) and (10) and the result connected with equation (11), the last two 
examples being for the special case of a real. 

From (19) we see that if a and 6 are two real linear operators, their product 
af need not be real. This product can be split up into a real part 


3(a8 + a8) = $(a8 + Ba) 
and an! imaginary part 
3(@3 — a8) = 3(a8 — Ba). 


Only when a and 6 commute is the product of also real. 


t[‘pure’ omitted. 


51 


16. Eigen-w’s as Basic w’s 


The connexion between an observable and the Hermitian operator that represents 
it in the sense of 88 is so close that we can use the same letter to denote them both, 
without getting into confusion. We can, in fact, go further and count the observable 
and the Hermitian operator as both the same thing, so that we say an observable 
is a Hermitian operator, which can operate either to the right on w-vectors or to 
the left on ¢-vectors. This provides a concise and convenient manner of speaking. 
A further rule of notation which we shall adopt is to denote an eigenvalue of 
an observable by the same letter that denotes the observable itself, with one or 
more primes attached. Thus the various eigenvalues of the observable a will be 
denoted by a’, a”,..., a™,.... 

The representations that we have used up to the present have all been quite 
general. We must now consider the question of how to introduce a particular 
representation which shall be advantageous for some special problem. The idea 
for this is provided by the orthogonality theorem of §9. Let us take some observable 
€ and suppose for the present that its eigenvalues form a discrete set of numbers. 
Let us suppose further that it has only one independent eigen-~ belonging to any 
eigenvalue. If we now choose a normalized eigen-w for each eigenvalue, we shall 
get a set of w’s, which are all orthogonal to each other and normalized and are 
such that an arbitrary ~ can be expanded in terms of them, so that they can be 
taken as the basic w’s of a representation. 

There will be one basic w associated with each eigenvalue of €. The basic 
Ww associated with an eigenvalue €’ we shall denote by w(€’). Also we shall 
use the eigenvalues associated with the various basic wW’s as the labels for 
the corresponding coordinates, instead of the arbitrary labels r, s, t of the two 
preceding sections. Thus the coordinates of a w~ in our present representation will 
be written (€’|), (€|),..., those of a ¢, (|&’), (|€”),..., and those of a linear operator 
a will be written like (€'Ja|é”). 

We can remove the restriction that there is only one independent eigen-w 
of € belonging to any eigenvalue. If there are several independent eigen-w’s 
belonging to some eigenvalue ~, we can choose out of all the eigen-w’s belonging 
to this eigenvalue a set whose members are all normalized and orthogonal to 
each other and are such that any eigen-7 belonging to this eigenvalue can be 
expanded in terms of them. (This choice can in fact be made in an infinite number 
of ways.) Let us call the members of this set W(£', a’), w(€), a”),.... The whole 
assembly of w(& a’)’s for all €’ and a’ may now be taken as the basic w’s of 
a representation. The natural notation for coordinates in this representation is 
(f'a"|), (|&’a’), (f'a’Ja|g"a"). 

In this way we can set up a representation for which all the basic w’s are 
eigen-w’s of some observable €. Let us see what the representative of € itself is in 


diagonal matrix 


52 III. REPRESENTATION THEORY FOR DISCRETE EIGENVALUES 


such a representation. We have from (11) 


Ew(é", a”) = S wl(E'a’)(E'a"|g|€"a"). 


é', a’ 
But since ~(€", a”) is an eigen-w, 
Ev( N a") = EP ail MW a’) 
Equating the right-hand sides of these two equations, we obtain 
( ‘a |E\E"a"") = "Seren Oalal! = E'Ogren Oalal's (20) 


where the two-suffix d-symbols have the meaning (4). 

The main feature of the matrix (€/a’|€|E"a") given by (20) is that all its matrix 
elements vanish except the diagonal ones, for which €' = €” and a’ = a”. 
Such a matrix is called a diagonal matrix. Thus we have obtained a representation 
in which the observable € is represented by a diagonal matrix, or, as we may say for 
brevity, a representation in which € is diagonal. The elements on the diagonal are 
just the eigenvalues of €. In the next chapter we shall do the corresponding work for 
the case when the eigenvalues of € form a continuous range of numbers.’ We shall 
then have the important and general result that we can set up a representation 
in which any given observable is represented by a diagonal matrix, whose diagonal 
elements are just the eigenvalues of the observable. 

As an example of the usefulness of choosing a representation in which some 
given observable is diagonal, we shall prove the following theorem. Any linear 
operator that commutes with an observable € commutes also with any function of €. 
The theorem is obviously true when the function is expressible as a power series. 
To prove it generally, let w be the linear operator, so that we have the equation 


Ew — w& = 0. (21) 


If we express this in terms of representatives in the above representation in which 
€ is diagonal, we get from (17) and (20) 


ye {€' Seren Sata ( Malu €" al") _ ( ‘a! ||" al EM Semen Sarg} an 0, 


gir qi! 
’ 


or é'( ‘a’ \w|E"a") _ ( ‘a’ \wle" a" Ee" ~ 0. (22) 


The € in (21) is represented in (22) by the multiplying factor €’ or €” 
This illustrates a useful general rule which we can apply whenever we have to take 


All the theorems and results of the present chapter will be obtained for the case of discrete 
eigenvalues only, the generalization to the case of continuous ranges of eigenvalues being left to 
the next chapter. 


16. Eigen-~’s as Basic w’s 53 


the representative of an equation involving a diagonal observable. From (22) 
we now obtain 


(E'al|wle"a") = 0. for A 6" (23) 
as the condition for w to commute with €. If f(€) denotes any function of €, 
its representative is, by the same argument as led to (20), 


(’a"| F(E) 1" a") = FE) bereSatar- (24) 


Using this, we obtain as the condition for w to commute with f(&€), by the same 
argument as led to (23), 


(fa’|w|g"a") =0 for f(g) # f(E"). (25) 


Now (25) is obviously a consequence of (23) and so the theorem is proved. 

As a special case of the theorem, we have the result that any observable 
that commutes with an observable € also commutes with any function of €. 
This result appears as a physical necessity when we identify, as in §13, the condition 
of commutability of two observables with the condition of compatibility of 
the corresponding observations. Any observation that is compatible with 
the measurement of an observable € must also be compatible with the measurement 
of f(€), since any measurement of € includes in itself a measurement of f(&). 

There is a converse theorem, which states that if two observables € and g are 
such that any linear operator that commutes with € also commutes with g, then g 
is a function of &. To prove it, we take a general linear operator w that commutes 
with € and use again the above representation in which € is diagonal, so that 
we have equation (23). By hypothesis 


gu —wg = 0, 
and this, expressed in terms of representatives, gives us 


S° {(€'a \g\e"a POE ah a” wlé"a " (€'a ‘wle"a MINE MW a” \g|E"a = 


él, ql! 
which reduces, with the help of (23), to 
Y(Ca'igle’a”)(eralle"a") — (Callulea™)(a"|glé"a")}=0, (26) 


Now the numbers (€’a’|w|é’a”) are all arbitrary and independent, so that we can 
extract from (26) a great deal of information about the numbers (£’a’|g|€"a”). 
If we take €’ differing from €" in (26), we see at once that 


(fa'|glg"a") =0 for 4 e" 


complete set 
of commuting 
observables 


54 III. REPRESENTATION THEORY FOR DISCRETE EIGENVALUES 


Further, putting €” = ¢’ in (26), we find that 


( ‘a! \g\é/a"”) 20 tor al # alt 


and (€a'|glé'a') = (Ca gle’. 
Thus (€/a’|g|Eé"a") is of the form 
(f'a'|g|g"a") = g(E)oger dara, (27) 


where g(&’) is some function of £4 which has to be real, in order that the matrix 
representing the observable g may be Hermitian. Comparing (27) with (24), we see 
that the observable g is just that function of the observable € that g(&’) is of the real 
variable €'. 

A representation which we require to be such that a certain observable € is 
diagonal still has a great deal of arbitrariness left in it, if there are more than 
one independent eigen-w’s of € belonging to any eigenvalue. We can reduce 
this arbitrariness by taking a second observable 7 that commutes with € and 
requiring the basic w’s to be simultaneous eigen-w’s of € and 7. We then get 
a representation in which both € and 7 are diagonal. If there are more than 
one independent simultaneous eigen-w’s belonging to any pair of eigenvalues ¢' 
& »/, we can introduce a third observable that commutes with both € and 7 
and require the basic w’s to be simultaneous eigen-w’s of all three, which will 
result in all three being diagonal. We can continue this process until eventually 
we have a representation for which the basic w’s are simultaneous eigen-w’s of 
a set of commuting observables, the set including so many commuting observables 
that there is only one independent simultaneous eigen-w of all of them belonging 
to any set of eigenvalues. Such a set of commuting observables will be called 
a complete set of commuting observables. This kind of representation is the most 
useful one in practice. In it each of the complete set of commuting observables 
will be diagonal. Further, the representation will be completely determined by 
the complete set of commuting observables, except for arbitrary phase factors 
arising from the fact that the basic w’s may be multiplied by arbitrary numbers 
of modulus unity without any of the conditions defining them being invalidated. 
We therefore conclude that there exists a representation in which each of any 
set of commuting observables is simultaneously diagonal. If the set is a complete 
one, the representation is uniquely determined except for arbitrary phase factors 
in the basic w’s. 

Let the observables €,, &9,..., €, form a complete commuting set and 
consider the representation in which they are diagonal. Each of the basic w’s, 
wW(E1, &,---,),) say, will be specified by a set of eigenvalues £1, &5,..., €). We can 
use these eigenvalues for labelling coordinates. A coordinate of a w or @ will thus 
be written (€4&5...€/|) or (\€5...&/,) respectively, which may be abridged to (€’|) 


16. Eigen-w’s as Basic w’s 59 


or (|€’) in work of a general theoretical nature. Similarly a coordinate of a linear 
operator a will be written (£465...) |al€/éo ...€"), or, abridged, (€’|a|é”). One of 
the &’s, say &m, will itself be represented, according to (20), by 


(ElEml€") = E.5ere", (28) 

where dee has the meaning 
The existence of arbitrary phase factors in the basic w’s means that 
we can multiply each ~(£/,&,...,€,) by a numerical factor of the form e’”, 


where y' = 7(€1,&,...,&,) is any real function of the variables &, &,..., &,. 
Such a change in the representation would require us to multiply the representative 
(€|) of any ~-vector by e7, the representative (|€’) of any ¢-vector 
by e~ and the representative (€’|a|é”) of any linear operator a by e7*"-7), 
where 7” = 7(€/,65,...,6"). A diagonal element of a linear operator remains 
unaltered, as is necessary on account of its physical meaning as an average, 
when the linear operator corresponds to an observable. For most purposes 
the arbitrary phase factors which exist in a representation are unimportant and 
trivial, so that we may count a representation as being completely determined 
by the observables that are diagonal in it. This fact is already implied in 
our notation, since the only indication in a representative of the representation 
to which it belongs are the letters denoting the observables that are diagonal. 

If 7” = 7(€7,6,...,€”) = f(&) denotes any function of the €’s, defined 
according to our general theory of functions of observables given in 8811 & 13, 
then we find for its representative, by the same argument as led to (20) or (24), 


(SUAS) = FE) deren (30) 


Thus any function of the observables € is represented by a diagonal matriz. 
Conversely, any Hermitian diagonal matrix represents a function of the &’s, 
since a general Hermitian diagonal matrix, g say, has for its elements 


(E’]g|E") = g(€ )oeren, 


where g(&’) is some function of the variables €’, which has to be a real function from 
the Hermitian condition. This matrix must therefore represent that observable 
which is the function g of the observables €. 

If w is a Hermitian operator that commutes with one of the €’s, say €,,, we see 
from (23) that (€’|w|€”) vanishes except when ¢), = €”. If w commutes with 
all the €’s, (€"|w|é”) must vanish except when € = &/ for all J. This means 


that w is a Hermitian diagonal matrix and hence represents a function of the €’s. 


56 III. REPRESENTATION THEORY FOR DISCRETE EIGENVALUES 


We, therefore, have the theorem that any Hermitian operator which commutes with 
each of a complete set of commuting observables is a function of those observables. 

It is easily seen that the theorem proved on pages 52 and 53, that any linear 
operator that commutes with an observable € commutes also with any function 
of €, and its converse are still valid when we replace the observable € by any set 
of commuting observables. 


17. Transformation Theory 


Let us take two representations, one with a complete set of commuting 
observables €,, diagonal and the other with another complete set of commuting 
observables 7 diagonal, and call them the €-representation and 1-representation 
respectively. The basic w’s in the two representations we shall denote for brevity 
by v(é’) and w(7’). An arbitrary ~ will now have the two representatives (£’|) 
and (n’|), which are functions of the sets of variables €, and 7} respectively, 
defined by 


b= vE)EN) (31) 
z 


and b= So v(n)\(n'\). (32) 


Since a w is completely determined by its representative in any one representation, 
there must be a connexion between the two representatives (€’|) and (7’|) such that 
either is determined by the other. We shall investigate the form of this connexion. 

Each basic w of the 7-representation, w(7’), will itself have a representative in 
the €-representation. We may write this representative (€'|n’), with 7 on the right 
to show which w it represents. We shall then have 


b(n!) = So v(E)E'ln’) (33) 


for the definition of (€'|n’). Substituting this expression for (7) in the right-hand 
side of (32), we get 


b= So ve \(Eln’)(a') 
en! 


which gives, on comparison with (31) 


(E'/) = SOE). (34) 


n! 


17. Transformation Theory 57 


This is the transformation equation which gives the €-representative of aw in terms 
of its 7-representative. The corresponding equation which gives (7’|) in terms 
of (é'|) may be shown in the same way to be 


(nfl) = Sle. (35) 
e 
where (7'|€’) is the representative of the basic w, w(&’), in the n-representation. 
The two representatives (€’]) and (|) are thus linear functions of one another. 
The expressions (&’|7’) and (7'|€’) which enable us to pass from one to the other 
will be called transformation functions. They are each functions of the two sets of 
variables €’ and 7’. We can obtain an explicit expression for (€'|n’) by multiplying 
equation (33) on the left by ¢(€'), corresponding to the way we obtained (5), 
the result being 


(En) = o(6)v(7"). (36) 
Similarly it may be shown that 
(n'|€') = d(7')W(€). (37) 


Hence (€'|7’) and (7'|€’) are conjugate complex numbers. 

The transformation functions must satisfy certain conditions in order that (34) 
and (35) may be consistent. If we substitute for (7'|) in (34) its value given by (35), 
we get 


() = Soin aleve'- 


n, év 
This must hold with (é’|) an arbitrary function of the variables €’ and hence 
Son Vnl&") = bere, (38) 


1! 
the 6 symbol being defined by (29). The corresponding equation in which € and 7 
have changed places, namely 


Sone) (E'|n") = On! n! (39) 
é/ 
may be deduced in the same way. 
Let us now consider the transformation of the representatives of ¢’s. We may 


deal with these analogously to the w’s. We have as the equation which gives 
the representative (|7') of an arbitrary ¢ in terms of its representative (|€’) 


(In!) = Soden), 


é/ 


58 III. REPRESENTATION THEORY FOR DISCRETE EIGENVALUES 


where the quantity (€’|n’) is now defined as the 7-representative of the basic ¢, 
$(€'), ie. by the equation 


H(€) = SOE ln’). 


7! 


If we multiply this equation by w(7") on the right, we obtain, as an explicit 
expression for this (€’|7/’), 


H(€o(n") = (E'In"), 


which is the same as (36). Hence this quantity (€'|n’), defined as 
the 1-representative of $(&’), is the same as our previous one defined as 
the €-representative of w(7’), so that our notation of using the same symbol for 
them both is justified. 

We can now understand the symmetry, referred to in §14, in the way in which 
a coordinate (r|z) of any w, wz, involves on the one hand the parameter r specifying 
one particular coordinate and on the other the parameter x specifying the w 
whose coordinates we are considering. If Ww, is normalized, we may consider it 
as one of the basic w’s of a new representation, and the coordinates (r|x) will 
then give us that part of the transformation function which refers to this one of 
the new basic ~’s. Each (r|x) may be considered either as the r-th coordinate 
of w, or as the x-th coordinate in the new representation of the basic ¢, ¢, of 
the original representation. 

The double meaning for a transformation function (€’|7') also enables us 
to understand better the significance of equations (38) and (39). If in (88) 
we regard (£'|7’) as the 7-representative of @(£’) and (7'|€”) as the n-representative 
of w(€"), the left-hand side of (38) becomes the 7-representative of the product 
0(&')w(E"), so the equation itself becomes equivalent to 


H(E)U(E") = deren. (40) 


Thus equation (38) just expresses, in terms of representatives in 
the 7-representation, the condition (40), equivalent to (3), that the basic ~’s of 
the €-representation are all orthogonal and normalized. Similarly equation (39) 
expresses, in terms of representatives in the €-representation, the condition 
that the basic wW’s of the 7-representation are all orthogonal and normalized. 
Equations (38) and (39), together with the condition that (€'|n’) and (7n'|é’) must 
be conjugate complex, are the only conditions imposed on the transformation 
functions by general theoretical requirements. 

Owing to the arbitrary phase factors occurring in representations, there will 
be a corresponding arbitrariness in the transformation functions. If the basic w’s, 


W(E'), U(7’) are multiplied by exp|if(€’)], explig(n’)] respectively, f and g being 


17. Transformation Theory 59 


arbitrary real functions, the transformation function (£'|7’) will get multiplied by 
exp{—i[f(€) — g(7)]}- 


Thus the modulus of the transformation function is quite definite, 
the indeterminacy being only in its phase. 

The connexion between the representatives of a linear operator a in the two 
representations may easily be obtained in a variety of different ways. We can, 
for instance, use the explicit expression for the representative of a@ given by 
equation (13). Applying this to the €-representation, we get 


(flale") = o(Eaw(é"). 


If we now substitute for the right-hand side, which consists of the symbolic product 
of three factors, its representative in the 7-representation, we get 


(Jalé”) = Solna) "€"), (41) 
n’, 1 
which gives the €-representative in terms of the 7-representative. Similarly we may 
obtain the result 


(/laln") = So M@le)(Elalé”(E"In"), (42) 
Ef elt 
giving the #-representative in terms of the €-representative. | These are 


the transformation equations for the representatives of a linear operator. 
Either representative is a linear function of the other and the same transformation 
functions are required for passing from one to the other as for the representatives 
of w’s and @’s. 

If we now take a third representation, ¢ say, we shall have transformation 
functions (¢’|&’), (€'|¢’) connecting it with the €-representation, and transformation 
functions (¢’|n’), (7|¢’) connecting it with the 7-representation. There are simple 
relations between the transformation functions. Equation (36), with ¢ instead of n, 
gives us 

(E10) = (6) 0(C/). 
If we substitute for the right-hand side, which consists of the symbolic product of 
two factors, its representative in the 7-representation, we get 


(E16) = So(Eln’ (ac). (43) 
1! 
The conjugate complex equation, which could be deduced independently in 
the same way, is 


(CE) = SO(n V(n'l€). (44) 


1! 


mixed 
representative 


60 III. REPRESENTATION THEORY FOR DISCRETE EIGENVALUES 


Equations (43) and (44) give the € & ¢ transformation functions in terms of the € 
& 7 and 7 & ¢ ones. 

It is sometimes convenient to use a mixed representative of an observable or 
linear operator, that is to say, a representative in the form of a matrix whose 
rows and columns refer to two different representations. We define the mixed 
representative (€/|a|n’) of a by 


(Elaln’) = So (E'lelé”)(E"ln’). (45) 


&! 


With the help of (41) we can express this mixed representative in terms of (7'|a|7"), 
the result being 


(Elaln) = So (einai Vn" E")(E"In!), 


v1 wt vW 
i & 


which reduces, with the help of (39), to 


(Elaln’) = SO (Ein Vn" lod” Sqr 
n,n” 
= EEN") (n"laln’). (46) 


7!’ 


Equation (46) may be taken as an alternative definition of the mixed 
representative (€’|a|7’). By similar pieces of algebraic work one can verify that 
the ordinary representatives are given in terms of the mixed representative by 


(EJalé”) = Solan’ )(a'é") 
7! 
(n/laln”) = So(nlé)(Elaln”). 
é/ 
The rows and the columns of a mixed representative are in general quite unrelated, 
so that no meaning can be given to its diagonal elements. The columns of one 
mixed representative may, however, refer to the same representation as, and thus 
be labelled to correspond to, the rows of a second mixed representative and we then 
have a multiplication law of the form 


S-(Elaln’) BIC) = (€lesl¢’), (47) 


as is easily verified. 


61 


The identical operator has for its mixed representative (&'|1|n’) 
(f"|L In!) = (61m) 


as follows at once from (45) or (46) and (14) with k = 1. We thus get 
a new meaning for the transformation function as the mixed representative of 
the identical operator. Further, we obtain from (45) and (28) 


(Semin) =€(6 |); (48) 


or more generally, from (45) and (30) 


(C1 F(8)In’) = F(E)(E'ln’). 
Similarly, using (46) instead of (45), we find 


(E'|mln’) = ("lnm (49) 
(E’la(n)|n') = (€'ln')a(n’). 


Finally, we have, using a multiplication law of the type (47), 


(\fEa(MIn’) = S(O Elan) 


rt 
= f(E\(E'ln’)g(m’). (50) 


18. Probability Amplitudes 


In 812 we obtained the probability of an observable having any specified value for 
a given state and in §13 we generalized this result and obtained the probability 
of a set of commuting observables simultaneously having specified values for 
a given state. Let us now apply this result to a complete set of commuting 
observables, say the set of €’s which we have been dealing with already. 
According to equation (40) of §13, we must take a normalized w representing 
the given state and expand it in terms of simultaneous eigen-w’s of all the €’s. 
Equation (31) can now be used as this expansion, provided the w on the left-hand 
side of (31) is normalized, since on the right-hand side of (31) there is just one 
term corresponding to any set of eigenvalues €’. The difference in form between 
equation (40) of §13 and equation (31) of §17 consists merely in the simultaneous 
eigen-w’s in the latter equation being written as normalized wW’s with numerical 
coefficients. If we now apply the result (41) of §13, we see that the probability, 


probability 
amplitude 


62 III. REPRESENTATION THEORY FOR DISCRETE EIGENVALUES 


for the state represented by the w of (31), of the €’s simultaneously all having 
specified values &’, is 

PEPSenl (51) 
Thus the probability of a complete set of commuting observables having specified 
values for a given state is equal to the square of the modulus of the coordinate, 
corresponding to these specified values, of a normalized w representing the state, 
in a representation in which each of the complete set of commuting observables is 
diagonal. 

There is therefore a simple physical meaning for the modulus of 
the representative of a normalized w~ in a representation in which each of 
a complete set of commuting observables is diagonal. ‘The existence of this 
physical meaning is the main reason why such representations are important. 
One may call the representative a probability amplitude, meaning by this something 
one must take the square of the modulus of to get an ordinary probability. 
There is no correspondingly simple physical meaning for the argument of 
the representative—indeed one cannot expect any, owing to the indeterminacy 
of this argument associated with the arbitrary phase factors of the representation. 
When the ~ is not normalized, |(é’|)|’ will still be proportional to the probability of 
the €’s having the values €’, the proportionality holding over all values of the €’’s. 

The probabilities that one calculates in practical problems in quantum 
mechanics are nearly always obtained as the squares of the moduli of probability 
amplitudes. Even when one is interested only in the probability of an incomplete 
set of commuting observables having specified values, it is usually necessary first 
to make the set a complete one by the introduction of some extra commuting 
observables and to obtain the probability of the complete set having specified 
values (as the square of the modulus of a probability amplitude), and then to sum 
over all possible values of the extra observables. A more direct application of 
formula (41) of §13 is usually not practicable. 

Let us now apply the formula (51) to a state which is one of the basic 
states of the n-representation, say the state represented by w(m’). This state 
is characterized physically by the property that a simultaneous measurement of 
all the 7’s for it is certain to lead to the set of results 7’. From (51) we see 
that the probability of the €’s having the values é’ for this state is just |(é’|7')|”, 
or the square of the modulus of the transformation function. The transformation 
function is now itself the probability amplitude. Since |(€’|7’)|? = |(7/|é’)|?, we have 
the theorem of reciprocity,—the probability of the €’s having the values €' for 
the state for which the n’s certainly have the values 7! is equal to the probability 
of the n’s having the values 7! for the state for which the €’s certainly have 
the values €'. The probability amplitude for either of these probabilities is 
the transformation function (€'|7’) or (7'|’). 


63 


The appearance of transformation functions as probability amplitudes results 
in the calculation of transformation functions being of practical importance. 
The general method of calculating the transformation function connecting 
two complete sets of commuting observables consists in first obtaining 
the representatives of one set in the representation in which the other set are 
diagonal. When we know these representatives, (€'|7|€”) say, we can write down 
the equations 

So(Elmlé")(E"ln’) = (Elm, (52) 

e 
which follow at once from (45) with a = 7, and (49), and proceed to solve them 
as algebraic equations in the unknowns (£'|7') and possibly also the 7/’s. They are 
just of the standard form of equations in the theory of eigenvalues, equivalent 
to (18) of Chapter II. The general solution ("|7’) of (52) will contain an arbitrary 
function of the 7/’s as a factor, and we must choose this function so as to satisfy 
the normalizing condition (39). 


19. Example 


As a simple example of the foregoing methods, let us consider three observables 
Ox, Oy, & o, which satisfy the following relations 


OyOz — Oz0y = 210z, 
0,0, — Ox0z = 21dy, (53) 
Oxy — OyOz = 210; 


and = o; oo =. (54) 


This example is of importance for a study of the spin of the electron, as will be 
seen in §39. 

We note in the first place that equations (53) are permissible since, from the 
work at the end of §15, their left-hand sides are! imaginary and can therefore each 
be equated to 2 times an observable. We can get these equations into a simpler 
form with the help of some straightforward non-commutative algebra. From the 
first of equations (53) we obtain 


24(Op0y + OyOx) = (Qidy)oy + oy(2ioz) 
= (OyOz — Oz Oy) Fy + Fy( Gyo, — FFy) 
= —3,0; + 0; Oz 
=p 


4[‘pure’ omitted. 


anticommute 


64 III. REPRESENTATION THEORY FOR DISCRETE EIGENVALUES 


from (54). Hence C26, = 60x: 


Two observables or linear operators like these which satisfy the commutative law 
of multiplication except for a minus sign will be said to anticommute. Thus o, 
anticommutes with o,. From symmetry each of the three observables o,, a, & a; 
must anticommute with any other. Equations (53) may therefore be written 


Oy Oz = 10z = —Oz Oy, 
O05 = 10, = Oy 04; (55) 
Ox Oy = 10, = —Oy Oz, 

and also from (54) O70) 02 = 1. (56) 


The three o’s may be considered as the components of a vector o in three- 
dimensional space and the algebraic equations which they satisfy would then be 
invariant under a rotation of axes. To verify this, let the components of o referred 
to a new set of mutually perpendicular axes be 


01 = ho, + moy + n10; 


02 = Ina, + Mody + N20z 


03 = 130, + Mg3dy + N30; 


the l’s, m’s and n’s being the direction cosines of the new axes relative to the old 
ones. We then have from (54) and (55) 


a; = (110% + ™1Oy + 0,)" 
= op + miog + nia? + lym (a_,0y + oyoz) + MM (Gyo, + 4,04) 
+ 411 (020% + Or0z) 
=Pimi+n?=1. 


Again 


0203 = (loa, + Mody + N2dz) (130, + M3oy + N30dz) 
— Igl3a2 + m2M30, + ngn3o2 + lomM30z0y + Mgl30y0r + MgN3Z0yOz 
+ N2M30z0y + Ngql30z0z + lon30z702 
= Igl3 + mem3 + nen3 + i(lgm3 — Mel3)o, + i(men3 — NgM3)dz 
+ i(ngl3 — lgng)oy 
= i(ljo,m10y + N10,) = i014. 
Thus 01, 02 & o3 satisfy relations of the same form as (54) and (55). 


The eigenvalues of 0? must be the squares of the eigenvalues of o,. Now from 
(54), a? has only the one eigenvalue unity and hence the eigenvalues of 0, can be 


19. Example 65 


only 1 and —1. Both these numbers must be eigenvalues of ¢,, otherwise o, would 
be equal to a number and could not anticommute with anything. Similarly, a, and 
a, each have as their eigenvalues 1 and —1. 

Let us set up a matrix representation for our observables o and let us take o, to 
be diagonal. If there are no other independent observables besides the o’s in our 
dynamical system, then a, by itself forms a complete set of commuting observables, 
since the form of equations (54) and (55) is such that we cannot construct out of 
Oz, Oy and a, any new observable that commutes with o,. The diagonal elements of 
the matrix representing o, being the eigenvalues 1 and —1 of o,, the matrix itself 


will be 
1 O 
(0 4) 
Let o, be represented by Gi, Go 
ey 
This matrix must be Hermitian, so that a; and a, must be real and ag and az 
conjugate complex numbers. The equation 0, 0, = —d,y 0, gives us 


ay a2 -_ ay —ag 
—a3 —a4 a3 —ag)’ 


so that a; = a4 = 0. Hence o, is represented by a matrix of the form 


0 ag 
a3 O07; 
The equation of 07 = 1 now shows that aga; = 1. Thus a2 and az, being 


conjugate complex numbers, must be of the form e’* and e~*® respectively, where 
qa is areal number, so that o, is represented by a matrix of the form 


0 ei 
mae 1 i 


Similarly it may be shown that a, is also represented by a matrix of this form. 
By suitably choosing the phase factors in the representation, which is not 
completely determined by the condition that o, shall be diagonal, we can arrange 
that o, shall be represented by the matrix 


0 1 
oo) 
The representative of o, is then determined by the equation o, = ia, 0,. We thus 
obtain finally the three matrices 


0 1 0 -i i 0 
(io) Ga) (=) en 
to represent o,, 0, and o, respectively, which matrices satisfy all the algebraic 
relations (53), (54), (55) & (56). The component of the vector o in an arbitrary 
direction specified by the direction cosines 1, m, n, namely lo, + mo, + noz, 


is represented by 


66 III. REPRESENTATION THEORY FOR DISCRETE EIGENVALUES 


Ge ee (58) 


In our representation with o, diagonal, the representative of a w will be written 
(o|) and will consist of just two numbers, corresponding to the two values +1 
and —1 for o/. These two numbers may be considered as forming a function of 
the variable o/ whose domain consists of only the two points +1 and —1. The state 
for which og, has the value unity will be represented by the ~ whose representative 
is the function, fa(o/) say, consisting of the pair of numbers 1, 0 and that for 
which o, has the value —1 will be represented by the ~ whose representative is the 
function, fs(o/) say, consisting of the pair 0, 1. Any function of the variable oJ, 
i.e. any pair of numbers, can be expressed as a linear combination of these two. 
Thus any state can be obtained by superposition of the two states for which oz 
equals +1 and —1 respectively. For example, the state for which the component of 
o in the direction 1, m, n, represented by (58), has the value +1 is represented by 
the w whose representative is the pair of numbers a, b which satisfy 


(iim oD) = G@) 


or na+(l—im)b=a, 
(i +im)a—nb= b. 
Thus a l-im_ 1l1+n 


bo l-n l+im’ 
This state can be regarded as a superposition of the two states for which oc, equals 
+1 and —1, the relative weights in the superposition process being as 


al? : |b]? = |L—im|*? : Q—n)? =14+n:1—n. 


IV. REPRESENTATION THEORY FOR 
CONTINUOUS EIGENVALUES 


20. Introduction of the 6 function 


In the preceding chapter we saw the convenience of using a representation in which 
the basic w’s are eigen-w’s of some chosen observable, or simultaneous eigen-w’s of 
some chosen set of commuting observables. We considered, however, only the case 
when the chosen observables have as eigenvalues discrete sets of numbers, all our 
equations being written down for this case. It is possible for an observable to have 
as eigenvalues all numbers in a certain range, and in that case it becomes necessary 
to make some modification in the theory. From general physical grounds and from 
the possibility of regarding a continuous range of numbers as a limiting form of 
a discrete set whose density is increased indefinitely, one would expect the theory 
to run on somewhat parallel lines in the two cases. It would be desirable to have 
this parallelism as accurate as possible, and our development of the transformation 
theory for continuous ranges of eigenvalues will be made with this object in view. 
Let us take an observable € with a continuous range of eigenvalues and 
suppose for the present that is has only one independent eigen-w belonging to any 
eigenvalue. Then, ignoring for the present the question of normalization, we can 
take its eigen-w’s, v(&’), as basic w’s of a representation. The number of these 
basic wW’s, equal to the number of axes of our system of coordinates, is an infinity 
of a high order, equal to the number of points on a line, but this is not in itself 
a source of difficulty. The fundamental equation defining the representative of aw, 
corresponding to equation (1) of the preceding chapter, must now read 


ie / we) ae (|x), (1) 


with an integral instead of a sum, the range of integration being understood to be 
the range of eigenvalues of €. (To conform to a neat style of writing in such 
equations it is desirable to put a differential element such as dé’ between the two 
factors that involve the corresponding parameter €'.) The representative of w,, 
namely (€’|x), is now a function of the continuous variable ¢’. 

We meet already with a difficulty in that not every ~ can be expanded in 
the form (1), in spite of the expansion theorem requiring every w to be expressible 


67 


improper function 


6 function 


68 IV. REPRESENTATION THEORY FOR CONTINUOUS EIGENVALUES 


as a linear function of the 7(€’)’s. An example of a ~ that cannot be expanded 
in the form (1) is one of the basic w’s itself, say w(€”). Another example is 
the differential coefficient dw(&")/d&é", when w(€”) involves the parameter €” in 
a sufficiently continuous way for this differential coefficient to exist. But in 
order that our theory for continuous eigenvalues may at all resemble our previous 
theory for discrete eigenvalues, it is necessary that such an expansion should be 
possible for every w, at least formally. We get over the difficulty by allowing 
the representative (€'|) to involve infinities and singularites of a certain type, 
chosen in just such a way as to make the expansion (1) always formally possible. 
This means allowing that (¢’|) need not be a function of its variable €’ according 
to the usual mathematical sense, which would require it to have a definite value 
for each value of its variable lying in a certain range, but may be something more 
general, which we call an improper function of the variable €’. Such an improper 
function may be pictured as the limit of a sequence of ordinary functions, 
corresponding to the fact that a ~ which cannot be expressed in the form (1) 
with (€’|) an ordinary function of €’ may be regarded ag the limit of a sequence of 
w’s that can. 

The chief improper function which we shall have to deal with is the 6 function, 


defined by 
| On) ae 1 
a (2) 


6(a) = 0 tor a 0; 


To get a picture of 6(x), take a function of the real variable x which vanishes 
everywhere except inside a small domain, of length € say, surrounding the origin 
x = 0, and which is so large inside this domain that its integral over this domain is 
unity. The exact shape of the function inside this domain does not matter, provided 
there are no unnecessarily wild variations (for example provided the function is 
always of order e~'). Then in the limit « > 0 this function will go over into 
the 6 function. 

The most important property of the 6 function is exemplified by the following 
equation, 


i ” f(x)6(x) de = f(0), (3) 


where f(a) is any continuous function of z. We can easily see the validity of 
this equation from the above picture of 6(x). The left-hand side of (3) can depend 
only on the values of f(a) very close to the origin, so that we may replace f(z) 
by its value at the origin, f(0), without serious error. Equation (3) then follows 
from the first of equations (2). By making a change of origin in (3), we can deduce 


20. Introduction of the 6 function 69 


the formula - 
/ f()6(a — a) dx = f(a), (4) 


where a is any real number. Thus the process of multiplying a function of x by 
d(a — a) and integrating over all x is equivalent to the process of substituting a 
for x. The range of integration, of course, need not necessarily be from —oo to on, 
but may be over any domain surrounding the critical point at which the 6 function 
does not vanish. In future the limits of integration will usually be omitted in such 
equations, it being understood that the domain of integration is a suitable one. 

Formula (4) must hold equally well whether f is a scalar, or a vector or tensor 
function of x. By an application of (4) with f a w-vector, we see that 


w(er) = f ve) ade’ a(e - 6" (5) 


provided w(£') depends continuously on the variable €’. In this way we can express 
the basic w, w(€") in the form (1). The representative of W(€") is just 6(&’ — €”). 

In order to express dw(&")/dé" in the form (1), we have to use the derivative of 
the 6 function, which is another improper function, more improper than the 6 
function itself. This derivative 6’ has the important property that, for any 
differentiable function f(x), 


[tose -a) az = - fo). (6) 


We can verify this property either by integrating the left-hand side by parts and 
then applying (4) with f’(x) instead of f(x), or by differentiating both sides of (4) 
with respect to a. The agreement of these two methods of verification provides 
evidence of the self-consistency of our use of improper functions. Formula (6) 
shows that the process of multiplying a function of x by 6'(a — a) and integrating 
over all x is equivalent to the process of differentiating it with respect to x and 
substituting a for x, with the provision of a— sign. From (6) we now get 


dep (6") 
get 


which expresses the w-vector dw(€")/d&” in the form (1) and shows that 
its representative is —d(€’ — €”). The higher derivatives of w(€”) with respect 
to v(€”) can also be expressed in the form (1) with the help of the higher derivatives 
of the 6 function. 

The foregoing work shows how the expansion of a w as an integral in 
the form (1) can be made of universal validity by the introduction of suitable 
improper functions. In this way we can get a foundation for the theory 


=- f wey aerae' -e"), (7) 


70 IV. REPRESENTATION THEORY FOR CONTINUOUS EIGENVALUES 


of representations in the case of continuous eigenvalues, corresponding to 
the foundation provided by equation (1) of Chapter III for the discrete case. 
Our definition and use of improper functions is not rigorous according to 
the standards of pure mathematics. It should be noticed, though, that an improper 
function can be given a rigorous meaning whenever it is a factor in an integrand. 
Now in the development of the theory, in every case where we have an improper 
function it will be something which is to be used finally only in integrands. 
We could therefore rewrite the theory in a form in which the improper functions 
appear all through only in integrands and could then eliminate the improper 
functions altogether and make the theory rigorous. The use of improper functions 
is thus not really connected with any essential lack of rigour in the theory. It is, 
rather, a convenient notation, enabling us to express in a concise form certain 
fundamental formulas which we could, if necessary, rewrite in a rigorous form, 
but only in a cumbersome way in which the parallelism with the case of discrete 
eigenvalues would be obscured. We shall confine our use of improper functions to 
such elementary equations that it will be obvious that the lack of rigour associated 
with them will not lead to a wrong result. 


21. Properties of the 6 function 


An alternative way of defining the 6 function is as the differential coefficient ¢'() 
of the function e(a) given by 


c= MG 0), 
= b: fase): 


We may verify formally that this is equivalent to the previous definition by 
substituting ¢(x) for d(x) in the left-hand side of (3) and integrating by parts. 
We find, for g; and g2 two positive numbers, 


[ioe dx = Poel) — [roan dx 


gl 
= Hn) - fF(e) de 
0 
= f(0), 
in agreement with (3). The 6 function appears whenever one differentiates 
a discontinuous function. 

There are a number of elementary equations which one can write down about 
6 functions. These equations are essentially rules of manipulation for algebraic 
work involving 6 functions. The meaning of any of these equations is that its two 
sides give equivalent results as factors in an integrand. 


21. Properties of the 6 function 71 


Examples of such equations are 


6(—a) = d(x), 
§'(-2) = we 8) 
0(x) = (9) 
26'(z) = —6(2), (10) 
S(ax) =a-*5(2) (a> 0), (11) 
6(a” —a*) = $a '{5(a —a)+6(x+a)} (a> 0), (12) 
[oa-2) dx 6(a — b) = d(a—b), (13) 
f(2)6(« — a) = f(a)6(w — a). (14) 


Equation (8), which merely states that 6(x) is an even function of its variable « is 
trivial. To verify (9) take any continuous function of x, f(x). Then 


[ t@xsta) ao = 0; 


from (3). Thus d(x) as a factor in an integrand is equivalent to zero, which is just 
the sae of (9). Again, to verify (10) take any differentiable function f(x). 
Thus, from (6) with a = 0, 


[ #28") az =-|Fr@ 0] = -F0 


from (3). Thus x6d’() as a factor in an integrand is equivalent to —d(x), which is 
just the meaning of (10). (11) and (12) may be verified by similar elementary 
arguments. To verify (13) take any continuous function of a, f(a). Then 


[tae [(a-2) ae ei, aes d(a — x) 


= fox») hea a ee )= | He )da d(a— b). 


Thus the two sides of (13) are equivalent as factors in an integrand with a as 
variable of integration. It may be shown in the same way that they are 
equivalent also as factors in an integrand with b as variable of integration, so that 
equation (13) is justified from either of these points of view. Equation (14) is also 
easily justified, with the help of (4), from two points of view. 

Equation (13) would be given by an application of (4) with f(x) = 6(a# — 0). 
We have here an illustration of the fact that we may often use an improper function 
as though it were an ordinary continuous function, without getting a wrong result. 
Another such illustration is obtained if we notice that the differentiation of 
equation (9) by the ordinary rules leads to equation (10). 


72. IV. REPRESENTATION THEORY FOR CONTINUOUS EIGENVALUES 


A further example of a useful improper equation is that, for real a, 


df e* dx = 2nd(a). (15) 
From the standpoint of rigorous mathematics, the left-hand side of this equation 
is not a definite quantity at all, even when a differs from zero, since the integral 
even then does not converge when the limits of integration tend to co and —on, 
but oscillates. If, however, we fix the limits of integration at g; and —g2 say, 
where g; and gz are both very large, and consider the dependence of the left-hand 
side of (15) on a, we see that it oscillates very rapidly about the value zero, 
except when a is very small. These oscillations will produce no effect in 
an integrand, and thus we can see the validity of (15) for a 4 0 according to 
our present standpoint. To verify (15) for a in the neighbourhood of zero, take any 
continuous function of a, f(a). Then 


y f(a) da ep er dr = i: f(a) da2a~* sinag 
a = eo 
= Inf (0), 


in the limit when g tends to infinity. A rather more complicated argument 
shows that we get the same result if instead of the limits g and —g we put q 
and —ge2, and then let g, and go tend to infinity in different ways. We can now see 
the equivalence of the two sides of (15) as factors in an integrand. 
As an illustration of work with the 6 function, we may _ consider 
the differentiation of log, 2. The usual formula 
d 1 


| pe 1 
= log, © = = (16) 


requires examination for the neighbourhood of «=0. In order to make 
the reciprocal function 1/xz well-defined in the neighbourhood of x =0 
(well-defined as an improper function) we must impose on it an extra condition, 
such as that its integral from —e to € vanishes. With this extra condition, 
the integral of the right-hand side of (16) from —e to € vanishes, while that of 
the left-hand side of (16) [has a magnitude that] equals log.(—1), so that (16) is not 
a correct equation. To correct it, we must remember that, taking principal values, 
log. x has an imaginary! term im for negative values of x. As x passes through 
the value zero this imaginary’ term vanishes discontinuously. The differentiation 
of this imaginary! term gives us the result, —i76(a), so that (16) should read 


d 1 
© jog, 2 = — — ind(z). 17 
qq Oe = = imd(x) (17) 


The particular combination of reciprocal function and 6 function appearing in (17) 
plays an important part in the quantum theory of collision processes (see §53). 


tPpure’ omitted. 


73 


22. Representations with One Continuous Parameter 


Let us go back to the representation given by equation (1). Our problem 
now is to find a suitable way of fixing the length of the basic w’s W(&’) as 
the usual normalization rule does not work here. We want some formula 
to replace (3) of Chapter III. We can attack this problem by referring to 
the physical meaning of the modulus of the representative (€'|) of a normalized 7. 
If the eigenvalues €’ are discrete, the square of this modulus, |(€’|)|*, gives us, 
as we saw in §18, the probability of € having the value €’ for the state represented 
by this normalized w~. If the eigenvalues €’ are continuous, the probability of &€ 
having exactly the value ¢’ for any physically obtainable state will be zero. 
The interesting quantity now will be the probability of € having a value lying 
within a specified range, say the small range from €’ to €’ + d&’. It would be 
convenient if we could arrange so that this probability is just |(&'|)|? dé. We should 
then have a close parallelism in the physical meanings of |(£’|)| in the two cases. 

S von ° arrange for the average value of any function f(€) of € to be 
ics 2 de for the state represented by any normalized w,. Now from 
the ayy eee of §12, this average must be ¢,f(€)wW,. We must therefore 
try to arrange so that 


va= fF) FE) (Ex)? (18) 
From (1) and its conjugate imaginary 
ba = [ (ole) de’ o€), (19) 
we get 
ofl = | (ale!) ae o€€ — (€”) dé" (Ex) 
= fale) ae (ere) f we") ae" ee (20) 
with the help of (28) of Chapter II. Now we want (18) to hold for an arbitrary 


function f and hence we can equate coefficients of f(€’) on the right-hand sides 
of (18) and (20). This gives 


\(élz)? = (ale’)¢ be bee”) dé" (|x) 
or (é'|e) = [oe ole we”) dé" (Ex). 


normalizing 
with continuous 
parameter 


74 IV. REPRESENTATION THEORY FOR CONTINUOUS EIGENVALUES 


In order that this may hold for an arbitrary function (|x) of €', we must have, 
from (4), 
P(E )W(E") = O(E — €"). (21) 
Equation (21) is the fundamental formula which the basic w’s have 
to satisfy in the continuous case, corresponding to formula (3) of Chapter III 
in the discrete case. If writtcn in rigorous mathematical notation without 
the 6 function, it would read 


HE) w(e") =0 ( ££"), (22) 
/ ole (el) de” = 1. (23) 


Equation (22) expresses that different basic w’s are orthogonal, exactly as in 
the discrete case. Equation (23) replaces the normalizing condition for the discrete 
case and may be called the normalizing condition for a w labelled by a parameter 
that takes on a continuous range of values. It should be remembered, though, 
that this involves a rather different meaning of the word ‘normalizing’ from 
what we had previously. <A w(€') normalized according to (23) is not of 
unit length, but has an infinitely great length, as may be seen from (21), 
Thus a ¢(&’) normalized according to (23) would not be correctly normalized for 
the general physical interpretation of §12 to be applicable, i.e. the average value of 
an observable a for the state represented by such a w(&’) would not be ¢(€’)aw(€’). 
It would be infinitely smaller. Still, the ratio of the average values of two 
observables a and 3 would be the ratio of o(€’)av(E') to o(€) bu (E'), and such 
ratios would be all one would usually wish to calculate about these average values. 
The state represented by a basic w, W(&’) is not of a kind that can actually exist. 
If an observable such as € with a continuous range of eigenvalues is measured for 
any actual state, the result must be distributed over a finite range according to 
some definite probability law, which range may be made as small as we please 
but cannot be contracted to a single point. The state represented by ~(&’) may, 
however, be regarded as a limit of actual states and as such it is a very useful 
theoretical abstraction. 

We can now proceed with the development of the theory on parallel lines to 
Chapter III. From equation (1) with the suffix x replaced by y and equation (19), 
we get 


ii, = : i (ale’) de” b(€)w(e") dé” (Ely) 
7 ij i (ole!) dé! 5(e — &") dé" (E"\y) 
= i (ale!) dé! (Ely) (24) 


22. Representations with One Continuous Parameter 75 


with the help of (21) and (4). This is the formula for the product of a ¢ and a ~ in 
terms of their representatives. It corresponds to (8) of Chapter III, differing from 
that formula only in that the sum is replaced by an integral. 

We define the representative of a linear operator a by 


ote") = f ule) de’ Elle” (25) 


corresponding to (11) of Chapter III. (An alternative definition, corresponding 

to (9) of Chapter III, would also be possible.) The representative (€'|a|&”) is 

now a function of two variables €’ and €” which can vary continuously. It is 
convenient to call such a function a ‘matrix’, using this word in a generalized sense, matrix 
in order that we may be able to use the same terminology for the discrete and 
continuous cases. One of these generalized matrices cannot, of course, be written 

out as a two-dimensional array like an ordinary matrix, since the number of its 

rows and columns is an infinity equal to the number of points on a line. The law of 
multiplication for these generalized matrices is found to be, by an analogous piece 

of work to that leading to (17) of Chapter ITI, 


(f"la8|g") = [@ole”) dg (EN B\E"). (26) 


It is the same law as in the discrete case, except that the sum is replaced by 
an integral. More generally, one can easily verify that the whole theory of 
multiplication of representatives given in §15, can be taken over if one just replaces 
sums by integrals all through. Equation (24) is an example of this. Further, 
the explicit expressions for representatives given by (5), (6) and (13) of §14 have 
their analogues in the present theory. For example, corresponding to (5) of §14, 
we get by multiplying (1) on the left by ¢(€’) 


(Ea = f ole ye") de" ("| 


= f oe’ -") a8" "te) 
= (Elz) 
from (21) and (4). Similarly, corresponding to (13) of §14, we find 
(flalg") = bE" )arb(é"). (27) 


An element on the diagonal, (&'\a|&’), is no longer, however, just the average 
value of the observable a (when the linear operator a represents an observable), 
for the basic state represented by w(€’), since the @- and w-vectors on 
the right-hand side of (27) are not correctly normalized for this to be so. 


unit matrix 


diagonal matrix 


diagonal element 


76 IV. REPRESENTATION THEORY FOR CONTINUOUS EIGENVALUES 


From (27) and (21) we find that the operator of multiplication by a number k 
is represented by 
(E|k|E") = kd(e’ — €"), (28) 
corresponding to (14) of Chapter III. The identical operator is represented 
by 6(€' — €"). For this reason the unit matrix in our present scheme of generalized 
matrices is defined as the matrix whose (£’,€”) element is d(€’ — €”). As defined 
in this way, the unit matrix leaves unchanged any matrix when multiplied into it, 
on either the right- or left-hand side, according to the law of multiplication (26). 
We define a diagonal matrix in our present scheme as one whose (€’, €”) element 
is equal to some function of € [or of €", which would be equivalent, from (14)| 
multiplied into 6(€’ — €”) and this function, i.e. the coefficient of 6(€’ — €’”), we call 
a diagonal element. The reason for this definition is that it is the widest one which 
gives to diagonal matrices the property of always commuting with one another, 
which property is a most fundamental one for diagonal matrices in the discrete 
case and is specially important in our theory of representations. It would not 
be sufficient to define a diagonal matrix merely as one whose (&', €”) elements all 
vanish except when €’ differs infinitely little from €”, as that would include a matrix 
such as 6(€’ — €"), which, as is easily verified by the methods of §21, does not 
commute with the matrix f(&')6(é’—€”) unless f(€’) is a constant. With the above 
definition the unit matrix and the matrix (€’|k|€”) given by (28) are diagonal 
matrices. Further, the representative of € is also a diagonal matrix, since, as easily 
follows from (25) or (27), 


(f1816") = £0 (6 — €"). (29) 


Thus choosing a representation in which the basic w’s are eigen-w’s of € is 
equivalent to choosing a representation in which € is diagonal, in the case of 
continuous eigenvalues €’ just as well as in the case of discrete eigenvalues, 
and the diagonal elements of the representative of € are in both cases just 
its eigenvalues €’. We can now see how the whole representation theory of 
the preceding chapter may be taken over to the case of a continuous range of 
basic states. We simply have to replace sums by integrals and the two-suffiz 
6 symbol deen by the 6 function 6(€' — €"), all the way through. 

The transformation theory of §17 may be taken over in the same way. If 7) is 
a second observable with continuous eigenvalues and we assume it for the present 
to have only one independent eigen-w belonging to each eigenvalue, we can 
introduce a second representation in which the basic w’s are eigen-w’s of 7 and 
in which, therefore, 7 is diagonal. There will then be transformation functions, 
(é|n’') and its conjugate complex (7/|&'), which enable one to pass from 
a €-representative to an 7-representative by formulas analogous to those of 817, 
with sums replaced by integrals. The conditions which the transformation 


77 

functions have to satisfy will be 
Jt) ant te") = 508-6" 
[ile ae 7) = 50! =0", 


instead of (38) and (39) of §17. 

It would be quite possible for one representation to have continuous eigenvalues 
and the other discrete.? We should then have similar transformation equations, 
with sums and integrals each occurring in the appropriate places. Instead of (30) 
we should have, if &’ were continuous and 7)’ discrete, 


S“(E'ln' )(n'lé") = 5(€ - £9, 
; (31) 
Joie) dé (€'|7") = Onigt: 


The physical interpretation of these transformation functions as probability 
amplitudes is evident. In the case of é’ continuous and 7’ discrete, |(€"|n’)|? dé’ will 
be the probability of € having a value within the small range from €' to €'+ d€’ for 
the state for which 7 certainly has the value 7. Also |(€’|7’)|? will be proportional 
to the probability of 7 having the value 7 for the state for which € certainly 
has the value €’ (the proportionality holding for all values of 7’), but will not be 
equal to this probability, since the ¢-vector $(&’) which is represented by (&'|7’) 
in the 7-representation is not properly normalized for this. In the case of both €' 
and 1' continuous, |(é’\7’)|" dé’ will be proportional to the probability of € having 
a value within the small range €’ to €’+ dé’ for the state for which 7 certainly has 
the value 7, and |(€’\7’)|? dy’ will be proportional to the probability of 7 having 
a value within the small range 1’ to 7’ + d7 for the state for which € certainly has 
the value €'. 


23. General Representations 


The work of the preceding section can be readily extended to the case when 
the observable € does not have only one independent eigen-~ belonging to any 
eigenvalue and when, following the method of the latter part of 816, we take our 
basic w’s to be simultaneous eigen-w’s of € and of one or more other observables 


tIf the number of basic ws is finite in one representation, it must, of course, be finite and 
equal in any other representation, but it may be infinite enumerable in one representation and 
infinite equal to the number of points on a line in another. An example of this will be given 
in §36. 


78 IV. REPRESENTATION THEORY FOR CONTINUOUS EIGENVALUES 


which commute with € and with each other and which together with € form 
a complete set. Let us call the observables of this complete set 1, €,...,€), and let 
us suppose each of them has a continuous range of eigenvalues. We can now make 
a straightforward generalization of our earlier theory, replacing the one-dimensional 
space of our former single variable ¢’ by the n-dimensional space of the variables 
Cie: 

Let us begin by obtaining the representative of one of the basic w’s, 
W(E1, €5,---,€7), or W(E”) say, for brevity. We note that 


(é") = i i - / we) de dey... de’, 6(E, — et6(E — ef) ...6(€, 6"), (82) 


as may be verified by carrying out the integrations one by one with the help of (4). 
This corresponds to (5) in the one-dimensional case. If we introduce the notation 


6(f — &") = dE; — €1)5(E5 — &3) --- 0S, — &n) (33) 


analogous to (29) of Chapter III, we find that the representative of (€") is just 
d(€’ — €”), a result which is formally the same as in the one-dimensional case. 
Also we have, for each m from 1 to n, 


0 is) 
a -}f.. i ) de, dé... de, (4, -E/)5(G-&)... 
5 (Ein Ein1)8" (Ene Erm) 5 (Eng —Emgt) «OCG EN), (84) 


as may be verified by carrying out the integrations one by one with the help of (4) 
and (6). The integrand in (34) differs from the integrand in (32) only in having 
the factor 6’(€), — €”) instead of 6(), — €/). Equation (34) is the generalization 
of (7) and gives us the representative of OW(€")/0E". 

We again fix the lengths of our basic w’s in such a way as to give a simple 
physical meaning to the modulus of the representative (€€5...€/,|), or (€|) for 
brevity, of a normalized w. We can arrange for the probability of a simultaneous 
observation of all the €’s, for the state represented by this w, yielding for each €,, 
a result in the small domain between €/, and €/, + dé’, to be 


\(e)? dé des... dé. (35) 


The condition for this turns out to be, with the notation (33), formally the same 
as (21). In fact if we use, in conjunction with (33), the notation of letting dé’ denote 
the product dé, d&,...d&/, and letting a single integral sign denote integration over 
all the variables €’, the equations and results of the preceding section will all apply, 
without formal alteration, to our present case. Thus, for example, the matrix law of 


23. General Representations 79 


multiplication (26) will still apply and the representative of a numerical multiplier 
will still be given by (28). Equation (29), to be made definite, should be rewritten 
(EE mlE") = En 5(E' — €"), (36) 
and then applies to each €,, of the set €, €,...,€). In conformity with our general 
plan, we must now define a diagonal matrix as one whose (£’,€”) element is equal 
to some function of the €’’s, or of the €’’s, multiplied into 6(¢’ — €”). This results 
in each €,, being represented by a diagonal matrix. It can now easily be verified 
that all the theorems of §16 are valid also for the case of continuous eigenvalues. 

A further generalization which we must make in our theory is to allow some 
of the €’s in the complete commuting set to have discrete eigenvalues and others 
continuous eigenvalues. The alterations which this requires in the formalism are 
obvious. For each variable independently we must use either a sum or an integral, 
and either the two-suffix 6 symbol or the 6 function, according to whether it is 
discrete or continuous. We can make a transformation to another representation, 
in which each of a new complete set of commuting observables 7 is diagonal, 
with the help of transformation functions satisfying conditions which are an 
appropriate generalization of (30) or (31). There is no need for the number of 
discrete 7’s, or the number of continuous 7’s, to equal the number of discrete 
or continuous €’s respectively. In fact the total number of 7’s in the 7-set may 
differ from the total number of €’s in the €-set. This non-conservation in the total 
number of observables is connected with the circumstance, which we found in 813, 
that two or more commuting observables may be counted as a single observable. 
We may at any time split up an observable into two commuting observables, 
a measurement of both of which is equivalent to a measurement of the original 
observable. For example, if the observable a does not have the eigenvalue zero, 
we may split it up into a? and |a|/a, a measurement of both of these being 
equivalent to a measurement of a. 

We must make yet one more generalization in order to include all cases which 
may occur, namely we must allow any € to have as eigenvalues a discrete set of 
numbers together with a continuous range. This would give a representation theory 
in which sums and integrals occur added together in the formulae. The necessary 
alterations to be made in the formulae are obvious. For example, if we take 
the case of just one € with discrete eigenvalues denoted by €’, €”,... and continuous 
eigenvalues denoted by €, €©),..., the formula defining the representative of 
a w-vector, corresponding to (1), will be 


o= Dwened + f ve) ae E) (37) 


The representative of a will now consist of the discrete set of numbers (&’|) 
and the continuous range (€|). These numbers may together be considered 


weight function 


80 IV. REPRESENTATION THEORY FOR CONTINUOUS EIGENVALUES 


as forming a function of a variable whose domain consists of a discrete set of 
points together with a continuous range. They have the physical interpretation, 
when w is normalized, that I(€"|) |? is the probability of € having the value ¢’ 
and \(e|) dé is the probability of € having a value in the small range € 
to € + d€™, The conditions for the basic w’s, corresponding to (21), will be 


0, 
HEMBE) =0, HEHE) = HE “| aa 


The representative of a linear operator will have four kinds of elements, 
typified by (é’lalé”), (€lale), (EMlale”), (EMlalg), and the matrix law of 
multiplication, corresponding to (26), will be 


(Elasle") = Seale" 1E") + [@ale) de (EO |BI€"), 
Ely 


(é/aple) = S7(Elale”) (E181) + / (é/lalé) de (€|a|e), 
EM 


(€lasle”) = dE lale”)(E"alé") + fea) dé (€O|gIe"), 
ENE 


(€laple) = SE ale”) (C"BlE) + / (EMlaley ge (EME): 
Fouad 


In the general case of many €’s, the representative of a w will be a function 
whose domain may consist of several separate regions with different numbers of 
dimensions, and it may even be convenient to label the points of the various 
separate regions according to different schemes, referring perhaps to different sets 
of €’s as diagonal matrices. 


24. The Weight Function 


The foregoing discussion is sufficiently general to include all kinds of 
representations which can occur, but there is a purely formal generalization which 
it is sometimes desirable to make. This consists in introducing what is called 
a weight function into the expression (35) for the probability of the €’s having 
values in certain specified small ranges for any given state. We replace (35) by 


(é’)? pC) de, de... dé, (39) 


where the weight function p(é’) is any function of the variables €’ which is 
greater than zero for all points in the domain of these variables. The use 


24. The Weight Function 81 


of a weight function is of no value from the point of view of general theory, 
but it is advantageous for certain special applications, for the purpose of increasing 
the symmetry of the equations or of giving a more direct physical interpretation 
to |(€’|)|° as a probability. For example, if two of the €’s are the angle variables 0 
and @ which fix some direction in space, it would be convenient to take as weight 
function sin6’, in order to have the element of solid angle sin6’d6’d¢’ in (39), 
so that we could interpret |(€’])|? directly as a probability per unit solid angle. 
It would be permissible to use a weight function also in the case of discrete 
eigenvalues, but there do not seem to be any examples in which it is then any help. 
The effect of the introduction of the weight function in the various formulae 
is easily investigated. The two probabilities (39) and (35) must, of course, 
be the same, so that, putting p(€’) equal to p’ for brevity, we may take the (&’|) 
in (39) to be p’~? times the (€’|) in (35). Formula (24) must now be replaced by 


bis i (xlé')o! ae’ (ely), (40) 


the extra factor p’ being inserted in the integrand to compensate for each of 
the factors (x|€’) and (€’|y) having p’~? times its value in (24). We generalize 
this result to the assumption that the weight function p’ is to be inserted as a factor 
to the differential d&' in every formula involving an integration over the &'’s, 
for example in (25) and (26). With this general assumption it is easily seen that 
all the formulae are still valid provided the various quantities they involve are all 
changed according to the following rules: 

(i) The representative (€’]) or (|€’) of any w- or ¢-vector is multiplied by p'~2, as 

we had above. 

(ii) Each basic ~, W(€’), and basic ¢, 6(€’), is multiplied by p’~4 

(iii) The representative (€’|a|é”) of any linear operator is multiplied by p/~3p!~2 
Thus formula (28), for instance, gets altered to 


(é"|kle”) = k(o'p”)20(€' — €") = kp’ 10(€’ — €", 


and the representative of the identical operator becomes p’~'6(€' — €"). 

For a certain type of general theoretical investigation the use of a continuous 
range of eigenvalues in the representation is extremely inconvenient and it becomes 
desirable, and is permissible, to replace the continuous range by a discrete set of 
points lying very close to one another over the whole range and eventually to pass 
to the limit when the density of these points is everywhere infinite. This procedure 
is equivalent to the introduction of a certain weight function, which tends to 
infinity in the limit. Let the number of discrete points in the small domain ¢' 
to €' + dé’ (which may be either one-dimensional or many-dimensional) be s'dé', 


82. IV. REPRESENTATION THEORY FOR CONTINUOUS EIGENVALUES 


where s’ = s(€’) is any function of €’ which is everywhere large. Thus s’ is 
the density of the discrete points. The general formula connecting a sum over 
the discrete points with an integral over the continuous range is now 


Srey = f res ae (41) 
: 


which shows that the discrete representation is equivalent to that continuous 
representation in which the weight function s has been introduced. We may now 
use the rules (i), (ii), (iii), with p replaced by s. We shall have, for example, 
using the suffix D to denote a representative in the discrete representation 


EDo=s ED), (ieo= Pe) i 
(élalé")p = sF(é'lale")s" 
and further 
Seon = Niles’ ED = i: ({é") dé (e"!) (43) 


é’ é 


from (41). 


V. THE QUANTUM CONDITIONS 


25. Poisson Brackets 


OuR work so far has consisted in setting up a general mathematical scheme 
connecting states and observables in quantum mechanics. One of the dominant 
features of this scheme is that observables, and dynamical variables in general, 
appear in it as quantities which do not obey the commutative law of multiplication. 
It now becomes necessary for us to obtain equations to replace the commutative 
law of multiplication, equations that will tell us the value of €7 — n& when € 
and 7 are any two observables or dynamical variables. Only when such equations 
are known shall we have a complete scheme of mechanics with which to replace 
classical mechanics. These new equations are called quantum conditions or 
commutability relations. 

The problem of finding quantum conditions is not of such a general character 
as those we have been concerned with up to the present. It is instead a special 
problem which presents itself with each particular dynamical system one is 
called upon to study. There is, however, a fairly general method of obtaining 
quantum conditions, applicable to a very large class of dynamical systems. 
This is the method of classical analogy and will form the main theme of 
the present chapter. Those dynamical systems to which this method is not 
applicable must be treated individually and special considerations used in 
each case. 

The value of classical analogy in the development of quantum mechanics 
depends on the fact that classical mechanics provides a valid description of 
dynamical systems under certain conditions, when the particles and bodies 
composing the systems are sufficiently massive for the disturbance accompanying 
an observation to be negligible. Classical mechanics must therefore be a limiting 
case of quantum mechanics. We should thus expect to find that important concepts 
in classical mechanics correspond to important concepts in quantum mechanics, 
and, from an understanding of the general nature of the analogy between classical 
and quantum mechanics, we may hope to get laws and theorems in quantum 
mechanics appearing as simple generalizations of well-known results in classical 
mechanics; in particular we may hope to get the quantum conditions appearing as 
a simple generalization of the classical law that all dynamical variables commute. 


83 


quantum 
conditions 
commutability 
relation 


Poisson Bracket 


84 V. THE QUANTUM CONDITIONS 


One of the fundamental ideas of classical mechanics is that of generalized 
coordinates gq, and their canonically conjugate momenta p,. An idea possibly still 
more fundamental, however, is that of the Poisson Bracket. Any two dynamical 
variables € and 7 have a P.B. (Poisson Bracket) which we shall denote by [£, 7], 


defined by 
= OE On OE On 
on] = d. ‘ie Op, Op» Op, J’ () 


€ and 7 being regarded as functions of the canonical coordinates and momenta q, 
and p, for the purpose of the differentiations. The P.B. owes its importance to 
its being invariant under a contact transformation, i.e. a transformation to a new 
set of canonical coordinates and momenta, so that it depends only on the two 
dynamical variables € and 7 to which it refers and is independent of which set of 
canonical coordinates and momenta one is using. The main properties of P.B.’s, 
which follow at once from their definition (1), are 


[é,7] = —[n, 1, (2) 
Ig; c| =U; (3) 


where c is a number, 


oe ae 


lf, a n2! = [é, m| Tv Is; ne], 


r 


= (£1, n]€2 + €1[€2, n], 
Ig; mn! — le mine + Te Is; n2|- 


Also the identity 
IE, [7,61] + (7, [¢, €]] + [¢, [&, 7] = 0 (6) 


is easily verified. Equations (4) express that the P.B. [€, 7] involves € and 7 linearly, 
while equations (5) correspond to the ordinary rules for differentiating a product. 

Let us try to introduce a quantum P.B. which shall be the analogue of 
the classical one. We assume the quantum P.B. to satisfy all the conditions (2) 
to (6), it being now necessary that the order of the factors €; and & in the first of 
equations (5) should be preserved throughout the equation, as in the way we have 
here written it, and similarly for the 7; and m2 in the second of equations (5). 
These conditions are already sufficient to determine the form of the quantum P.B. 
uniquely, as may be seen from the following argument. We can evaluate the P.B. 


25. Poisson Brackets 85 


[61S2, 7172] in two different ways, since we can use either of the two formulas (5) 
first, thus, 


[E1€2, m2] = (£1, mMe2]€2 + €1[£2, 7172] 
= {[61, m)m2 + m[E1, M2] to + €1{[E2, m]n2 + m[E2, na] } 
= [Gi, m|n2&2 + ml&i, nal|€2 es [é2, mine + €im [E2, 12] 
and [E1€2, m2] = [€1€2, m) M2 + m[E1€2, 72] 
= (£1, m€om2 + €1[£2, mln2 + mE, N2\€2 + mE1[€2, 72] 


Equating these two results, we obtain 


(61, m] (2% — N22) = (1m — m£1)[E2, M2]- 


Since this condition holds with €; and 7, quite independent of € and 1, 
we must have 


fim — mer = thiE1, m1], 
Eone — Nr€2 = thls, no], 


where A must not depend on & and 7; nor on €2 and 72, and also must commute 
with (€:7, — m&:). It follows that A must be simply a number. We want 
the P.B. of two observables or real variables to be real, as in the classical theory, 
which requires, from the work at the end of §15, that A shall be a real number 
when introduced, as here, with the coefficient 7. We are thus led to the following 
general formula for the quantum P.B. |€,n] of any two variables € and n, 


En — n€ = ihlE, n], (7) 


in which fA is a new universal constant having the dimensions of action. In order 
that the theory may agree with experiment, we must take h equal to h/2z, 
where h is the universal constant that was introduced by Planck, known as 
Planck’s constant. It is easily verified that the quantum P.B. satisfies all 
the conditions (2), (3), (4), (5) and (6). 

The problem of finding quantum conditions now reduces to the problem 
of determining P.B.’s in quantum mechanics. The strong analogy between 
the quantum P.B. defined by (7) and the classical P.B. defined by (1) leads us 
to make the assumption that the quantum P.B.’s, or at any rate the simpler ones 
of them, have the same values as the corresponding classical P.B.’s. The simplest 
P.B.’s are those involving the canonical coordinates and momenta themselves and 
have the following values in the classical theory: 


[ar qs| 2 0, [Pr Ps| = 0, 
[ar Ps = Ors: 


(8) 


h, hi 


Planck’s constant 


86 V. THE QUANTUM CONDITIONS 


We therefore assume that the corresponding quantum P.B.s also have the values 
given by (8). By eliminating the quantum P.B.s with the help of (7), we obtain 


the equations Ords = 050, = 0. Peps = Depp = 0, (9) 


GrPs — PsGr = iNOrs, 
which are the fundamental quantum conditions. They show us where the lack of 
commutability among the canonical coordinates and momenta lies. They also 
provide us with a basis for calculating commutation relations between other 
dynamical variables. For instance, if € and 7 are any two functions of the q’s 
and p’s expressible as power series, we may express €7 — n€ or [€, 7], by repeated 
applications of the laws (2), (3), (4) and (5), in terms of the elementary P.B.s 
given in (8) and so evaluate it. The result is often, in simple cases, the same 
as the classical result, or departs from the classical result only through requiring 
a special order for factors in a product, this order being, of course, unimportant 
in the classical theory. Even when € and 7 are more general functions of the q’s 
and p’s not expressible as power series, equations (9) are still sufficient to fix 
the value of &n — n€, as will become clear from the following work. Equations (9) 
thus give the solution of the problem of finding the quantum conditions, for all 
those dynamical systems which have a classical analogue and which are describable 
in terms of canonical coordinates and momenta. This does not include all possible 
systems in quantum mechanics. 

Equations (7) and (9) provide the foundation for the analogy between quantum 
mechanics and classical mechanics. They show that classical mechanics may be 
regarded as the limiting case of quantum mechanics when h tends to zero. A P.B. 
in quantum mechanics is a purely algebraic notion and is thus a rather more 
fundamental concept than a classical P.B., which can be defined only with 
reference to a set of canonical coordinates and momenta. For this reason canonical 
coordinates and momenta are of less importance in quantum mechanics than in 
classical mechanics; in fact, we may have a system in quantum mechanics for which 
canonical coordinates and momenta do not exist and we can still give a meaning 
to P.B.’s. Such a system would be one without a classical analogue and we should 
not be able to obtain its quantum conditions by the method here described. 


26. Canonical Coordinates and Momenta 


Let us examine in greater detail the conditions (9) for canonical coordinates and 
momenta, which we assume to be all observables. One of the first things we notice 
is that two variables with different suffixes r and s always commute. It follows that 
any function of q, and p, will commute with any function of g, and p,; when s differs 
from r. Thus dynamical variables referring to different degrees of freedom commute. 
This law, as we have derived it from (9), is proved only for dynamical systems with 


26. Canonical Coordinates and Momenta 87 


classical analogues, but we assume it to hold generally. In this way we can make 
a start on the problem of finding quantum conditions for dynamical systems for 
which canonical coordinates and momenta do not exist, in so far as we can give 
a meaning to degrees of freedom. 

In applications of quantum mechanics it is often convenient to take two 
separate dynamical systems and to put them together and count them as forming 
a single system. This would be useful, for instance, if one wanted subsequently 
to introduce an interaction between the two systems and to treat this interaction 
perhaps by a perturbation method of the kind given in Chapter VIII. We can see 
from the above law how two dynamical systems may be counted as a single system. 
All the dynamical variables of one of the constituent systems must commute 
with all those of the other, since each of the two constituent systems has 
its own degrees of freedom. If we now take a complete set of commuting 
observables €, for the first constituent system and a complete set of commuting 
observables £2 for the second, then it is easily seen that the €;’s and &2’s together 
form a complete set of commuting observables for the whole system; in fact, 
the basic w’s, w(€, 5) in the (£1, €2)-representation for the whole system may each 
be considered as a product of the basic w’s, w(&,) and (£5) in representations 
for the constituent systems, the w-space for the whole system being considered 
as the product of the w-spaces for the constituent systems. The product of 
representatives of w’s for the constituent systems will give the representative of 
a wW for the whole system, thus, 


(S161) = (G11) (E91), (10) 


although, of course, the general w for the whole system will not be of the form of 
the right-hand side of (10), but will be a sum or integral of terms of this form. 
If m, and 2 denote a second pair of complete sets of commuting observables 
for the two constituent systems respectively, the transformation function for 
the whole system will be just the product of the transformation functions for 
the constituent systems, thus, 


(f185|mima) = (Sil) (ln). (11) 
The multiplication laws (10) and (11) apply, of course, to any division of the degrees 
of freedom of the whole system into two sets, even though these two sets do not 
correspond physically to two constituent systems. The generalization to more 
than two constituent systems, or more than two sets of degrees of freedom, can be 
made immediately. 
Let us now go back to the equations (9) and see what they tell us for a single 
degree of freedom. We have now just one g and one p, forming what is called in 
classical mechanics a pair of conjugate variables, and they satisfy 


qp — pq = th. (12) 


88 V. THE QUANTUM CONDITIONS 


Equation (12) is the fundamental equation in quantum mechanics for a pair of 
conjugate variables describing a degree of freedom that has a classical analogue. 
It is of such frequent occurrence that its main algebraic consequences should 
be noted. 

We have 

gp — pg = 4(qp — pa) + (ap — pg)g = 2ihq. 
The more general formula 
q"p — pq” = nig’ (13) 
is also valid. It is best proved by induction. Assuming (13) holds for one particular 
value of n, we find 
q’*'p — pq" = q(q"p — pq") + (ap — pga” 
= q(nih)q’ | + ihq” = (n+ 1)iha", 

which is just (13) with n+ 1 for n. Thus (13) holds generally. We may write (13) 
in the form 


d 
” _ pq” = ih—q". 14 
a aaa (14) 
It follows that if f(q) is any function of q expressible as a power series, 
aie 
— nf =th~— 15 
fp—pf =i ag’ (15) 


since we can apply (14) separately to each term in the expansion of f. In the next 
section at the end we shall see that (15) holds also for more general functions f 
that are not expressible as power series. 

There is one example of (15) of special interest, namely when f is 
the exponential function e’?, c being any real number. This exponential function 
is defined in the usual way as the sum of the series 


= y vod (16) 


n=0 


and the ordinary exponential theorem must be valid for it, since there are no 
non-commuting quantities occurring in (16) to make a difference from ordinary 
algebra. With this expression for f, (15) becomes 


ey a per _ —che'4 
or pe = e'4(p + ch). (17) 
Let w, be an eigen-~ of p belonging to the eigenvalue p’, so that 


DWpr = D'by . 


89 


From (17) we obtain 


pe by = "(p+ chy =e (pl + ch) vy 
= (p' + chewy. (18) 


Thus e’%2),, is an eigen-w of p belonging to the eigenvalue p’ + ch. In this way 
we see that if p’ is any eigenvalue of p, p’ + ch must be another. Since c is 
arbitrary, it follows that the eigenvalues of p must include all numbers from —oo 
to oo. Similarly it may be shown that the eigenvalues of g include all numbers from 
—oco to oo. Hence canonical variables satisfying (12) or (9) have as eigenvalues all 
numbers from —oo to co. This result is known to be true from physical grounds 
in the case when the canonical variables are Cartesian coordinates and momenta 
of particles. 

A possible source of difficulty in the above deduction should be pointed out. 
We could take c to be a pure[ly| imaginary or complex number and could then still 
formally deduce (17) and the (18). This would seem to lead to the result that p has 
complex numbers as eigenvalues, whereas p being, as we assumed at the beginning 
of this section, an observable, can have only real eigenvalues. The solution of 
the paradox lies in the fact that, for imaginary or complex c, the series (16) must 
be counted as non-convergent and the operator e’! as not existing according to 
our general theory of linear operators given in Chapters II to IV. The eigenvalues 
of g extend to —oo and oo, and at one or other of these places e*? would tend to 
infinity so rapidly as to be physically inadmissible as an operator, that can operate 
on w’s representing states that actually exist to give other w’s representing states 
that actually exist. 


27. Momenta as Differential Operators 


Let us take a dynamical system described by a set of canonical coordinates and 
momenta q,, p,, and introduce a representation in which the coordinates q,. 
are all diagonal. We may assume the gq, to form a complete set of 
commuting observables, the justification for this assumption being that it leads, 
as we shall see, to a self-consistent scheme of representatives for the q’s and p’s 
satisfying all the conditions (9). The representative of any ~ will thus be of 
the form (qiq5...q,|). The domain of each of the variables q' extends from —oo 
tO oO. 

To begin our investigation, let us suppose our dynamical system has only one 
degree of freedom, so that we have to deal with just one g and one p, satisfying (12), 
and the representative of a w is simply (q’|). An interesting linear operator now 
presents itself for study, namely the operator of differentiation of any (q‘|) with 
respect to q’. This linear operator can be applied to the representative of any w and 


90 V. THE QUANTUM CONDITIONS 


will give a function of q’ which may be regarded as the representative of another w. 
This linear operator may therefore be handled according to our general theory of 
Chapters IT to IV. Let us denote it by 7 in symbolic notation. It may be defined 
by the condition that if 


y= [ow dq (q'\), 


; (19) 
then TY = [ow dq —(¢)- 
q 
Its representative is thus determined by 
[edie ae a") = 
qd dq! 
and is therefore (q'|x\q") = 6'(d — q"), (20) 


from (6) of Chapter IV. From (20) we can see that ia is Hermitian. We can 
also see that when 7 operates to the left on a ¢-symbol, it is equivalent to minus 
the operator of differentiation applied to the representative of that ¢-symbol; thus 


[Md) ae ina") =- EW) 


so that if 
o= |W) dy 0), 
; (21) 
then on = [-z dq 0(q’). 


Let us now work out the commutability relation connecting 7 with g. We have 
the equation! 


ala} =¢ Cd) +, 


which, written in symbolic notation, becomes 
Tgp = qr + ¥. 


This equation holds for arbitrary ~ and we may therefore cancel out the factor w. 
We are then left with 
mq — qn =1, (22) 


1? omitted from the equation] 


27. Momenta as Differential Operators 91 


which is the required commutability relation. It could have been obtained 
alternatively directly from the representative (20) with the help of properties of 
the 6 function given in §21, in particular equation (10) there. 

On comparing (22) with (12), we see that —ihz satisfies the same 
commutability relation with g that p does. Their difference, p + cha, commutes 
with g. From the theorem on page 56, [near the end of §16,] and from our 
assumption that q by itself forms a complete set of commuting observables, it 
follows that p+ zha must be a function of q, i.e. 


p+ihn = f(q), (23) 


Both p and ihz are Hermitian operators and the function f is real. We shall now 
see that, by suitably choosing the phase factors in our qg-representation, we can 
arrange to make f vanish and p just equal —ihz. 

Let us take a new representation in which q is diagonal, differing from 
our previous one in the phase factors of the basic w’s, and let us use stars to denote 
things referring to the new representation. The connexion between the basic w’s 
of the two representations will be of the form 


vr) =v(q)e, (24) 


where 7’ = 7(q’) is any real function of q’. The new representation will give us 
a new operator 7*, defined, in a corresponding way to (19), by the condition that if 


y= few dq (q'\)*, 


(25) 
d 
then Ty = few ag el) 


Putting the same ~ on the left-hand sides of (19) and (25), we have 
; @prae"a) 
* ~ir! ig @Y 
ee ag f) ite Sag ee rca) 


The second of equations (25) now gives us 
Pea sy UY 
mp = few dq ca ae ee sUl)} 


= [ow dq! tal) -i Za} 


= Ty —- ‘Dy. 


92 V. THE QUANTUM CONDITIONS 


Since this holds for arbitrary ~ we have 


W=7— al 
= i 
Hence Bart dy dy 
pt+ithr = ere (26) 
qd 


from (23). We may now choose the function y, which has been left arbitrary up 
to the present, so that the right-hand side of (26) vanishes. This will make p just 
equal to —ih7* 

We can easily extend this work to the case of n degrees of freedom. We then 
have n differentiation operators 7,, one for each degree of freedom, and we define 
them by replacing the second of equations (19) by 


O 
mo = f old dq ——(q|)- 27 
(a) da’ 5 (a) (27) 
The representative of 7, will be 


(q'\trla”") = 6(4¢, 41) 5 (4-45) - O(n — Gra) + ce 

28 

like the representative in the right-hand side of equation (34) of Chapter IV. 

From the form of this representative we again see that 77, is Hermitian and that, 

when 7, operates to the left on a @-symbol, it is equivalent to minus the operator 

of differentiation with respect to g, applied to the representative of that ¢-symbol. 
To obtain the commutability relations for the 7,’s, we note first that 


oO? 0? 


which, written in symbolic notation, gives us 
TT st) = WT pW. 


This holds for arbitrary ~ and hence 


Tr — WeMpr = 0. (29) 
Againt 0 tft =A! 0 ! ! 
ag, anal) GHG I) + rsa), 


tT? omitted] 


27. Momenta as Differential Operators 93 


which, written in symbolic notation, gives us 


TsQrW) = OrT 5) + Ors 
and hence ars — Tsdr = Org: (30) 


Comparing (29) and (30) with (9), we see that the operators —ifz, satisfy the same 
commutability relations with the q’s and with each other as the p, do. We can get 
a generalization of (30) by noticing that, if f(q) is any function of the q’s, 


StiayaD} = ta 2d) + Fay. 


Written in symbolic notation, this gives us 


nafv = rapt Shp 
ds 
and hence fts—Tsf = ot (31) 


From (30) and the corresponding equations for p, in (9), we see that each 
of the operators p, + aha, commutes with each of the q’s. It follows as before, 
from the theorem on page 56, [near the end of §16,] that each p,+ihz, is a function 
of the q’s only, i.e. 

Ps + ints = fs(q), (32) 


f;(q) being a function of the q’s, necessarily real. Using (29) and the corresponding 
equation for p’s in (9), we now obtain 


0= PrPs — PsPr 
= (—tha, + f,)(—tha, + f.) — (—tha, + f.)(—ihia, + f,) 
= —ih(a,fs + ts a Tapp = faite) 


or Tete —_ [hs —= Teds aa Pes 
This gives, with the help of (31), 
Of,  Ofs 


Oqs  OGr’ 


which shows that the functions f, are all of the form 


f, = OF /0Oq, 


94 V. THE QUANTUM CONDITIONS 


where F’ is a function of the q’s independent of r. (It may be taken to be 
a real function.) Equations (32) now become 


Ps = —1Ts +e OF /0qs- (33) 


Let us now, as in the case of one degree of freedom, introduce a new 
representation in which the q’s are all diagonal, differing from the previous one 
in the phase factors of the basic w’s, and let us again use stars to denote things 
referring to the new representation. The connexion between the basic w’s of the two 
representations will again be of the form (24), 7’ now being an arbitrary function 
of all the variables q’. The new differentiation operators 7* will be defined by 
equations (25), with the second of these equations replaced by 


mib= f wd)ad Sa 


corresponding to (27). We now obtain, by similar analysis to that which led to (26), 
the result 
OY 


=F, =% 5. 
Comparing this with (33), we see that, if we take y = —F'/h, each p, becomes just 
equal to —7hr%. 

We have now established the general result that, by suitably choosing the phase 
factors in a representation in which the q’s are diagonal, we can make each of 
the momenta conjugate to the q’s take the form 


Ps = —thrs (34) 


wT, being the operator which, operating to the right on a w-vector, is equivalent 
to differentiation of the representative of that w-vector with respect to qs, 
and operating to the left on a @-vector, is equivalent to minus the differentiation 
of the representative of that d-vector with respect to q,, The interpretation of 7, as 
a differentiation or as minus a differentiation, according to whether it is multiplied 
to the right or to the left, is easily seen to apply generally, also when the thing 
it is multiplied into is not a w or a ¢@. Thus, for example, with 7, multiplied into 
a linear operator €, we have 


(Indl) = x(a") 


! Ny 0 ! " 
(q'|Ex5|q") = agi Ela"), 


27. Momenta as Differential Operators 95 
giving 


! NY . 0 ! " 
(q'|psElq") = haa (a |Ela"), 


! NW\ 0 ! " 
(q'|Eps|q") = ina a4 |Elq"). 


(35) 


The result (34) is a very valuable one in applications of quantum mechanics. 
It is a consequence only of the quantum conditions (9) and may be regarded 
as a new way of expressing these conditions. We may illustrate its value 
by taking any function H(q,,p,) of the q’s and p’s expressible as a power 
series in the p’s. This function must be equal to H(q,,—iha,) and therefore, 
as an operator operating to the right on a w-vector, it must be equivalent to 


the differential operator 
i 
aa. —ihige ) (36) 


in which each 7, has been replaced by 0/0q, without any alteration of the order of 
factors in products, operating on the representative (q’|) of this ~-vector. Thus H 
becomes expressed: as a familiar kind of differential operator. The equation for 
determining the eigenvalues of H is? 


a a, ~ihe>) (al) = HVA. (37) 

qs 
which is just a partial differential equation for the unknown function (q’|) and 
the unknown eigenvalue H’. A solution (q’|) of an equation like (37) is called 
an eigenfunction of the relevant operator H. Equations of the form (37) were 
introduced into quantum mechanics by Erwin Schrédinger. 

We can now understand rather better the meaning of the indeterminacy 
in a representation when only the observables that are to be diagonal in it 
are specified, at any rate for the case when these observables are a set of 
canonical coordinates. Corresponding to each representation in which the q’s 
are diagonal there exists one set of momenta conjugate to the q’s |i.e. satisfying 
the same conditions as the p’s in (9)] whose representatives are of the specially 
simple form (34) instead of the more general form (33). If we take some 
particular set of momenta conjugate to the q’s and require that these shall have 
representatives of the specially simple form, the representation is then completely 
determined, except for a trivial phase factor e’* where 6 is independent of 
the q’’s, since the function y above is completely determined by the condition 
that each —7zAz, shall equal p,, except for an arbitrary constant. 


1¢” omitted from the equation] 


eigenfunction 


96 V. THE QUANTUM CONDITIONS 


As a corollary to the above work we may note that, from (31) applied in 
a representation in which (34) holds, 


of 
Ods. 
This is the generalization of equation (15) to the case of several degrees of 


freedom and the case when f is a function of the q’s not necessarily expressible as 
a power series. 


IDs = psy. = ih (38) 


28. Heisenberg’s Principle of Uncertainty 


On account of the general symmetry between the q’s and the p’s in the quantum 
conditions (9), it must be possible to interchange q’s and p’s throughout the work of 
the preceding section. This would mean setting up a representation in which the p’s 
are diagonal and each q is represented by the operator, +7h times differentiation 
with respect to the corresponding p, the + sign being taken when it operates to 
the right and the — sign when it operates to the left. (These signs are just the other 
way round to what we had in the preceding section.) The two representations 
would be equally fundamental from the point of view of general theory. In practice, 
the representation in which the q’s are diagonal is the more useful one in general, 
since most of the dynamical quantities one has to deal with are expressible as 
power series in the p’s (usually of degree not higher than two), but are not 
expressible as power series in the q’s, and so would take the form of differential 
operators like (36) in the q-representation, but would not take the form of 
differential operators in the p-representation. There are, however, some problems 
in which the p-representation can be used with advantage, and it becomes desirable 
to calculate the transformation function connecting the two representations. 

Let us take the case of the system with one degree of freedom and calculate 
the transformation function (q'|p') connecting the representation in which q is 
diagonal with that in which p is diagonal. We shall use the general method 
described at the end of §18. Equation (52) of that section, applied to our present 
problem, reads 


[(dvla") aa" "v= ee’ 
We can evaluate the left-hand side of this equation according to (34), provided 


the phase factors of the qg-representation are suitably chosen. This gives 


sid 
th (dp) = Pap’). 
q 
The solution of this differential equation for (q'|p’) is 


(q'[p’) = ali /* (39) 


28. Heisenberg’s Principle of Uncertainty 97 


where a’ = a(p’) is an arbitrary function of p’. Note that (39) gives the general 
form of the g-representative of an eigen-w of p’. 
We can determine the modulus of a’ by using the normalizing condition 


/ . (p'|q') dd (q'|p") = 6(p' — v"), 


(oe) 


which comes from (30) of Chapter IV. This gives, when we put 


(p'lq’) = (ap) = ae?” 


the equation 
Gal! i. elf (P"—p')/h dq! = 6(p! Sop"), 


where a” = a(p"). Integrating the left-hand side with the help of (15) of 
Chapter IV, we obtain 


2naa"d{(p" — p')/h} = 4(p' — p"), 
and hence, from (8), (11) and (14) of Chapter IV, 
Icha ad = 1, 
Thus a’ is of the form h~%e'”’, where 7’ is some real function of p’, and hence 
(parser crit 


By suitably choosing the phase factors of the p-representation [those of 
the g-representation were chosen when we made use of (34), but those of 
the p-representation are still arbitrary] we may remove the factor e’”’, leaving as 
final result 
(qip) Sateen (40) 
The result (40) shows that the formulas connecting the g- and p-representatives 
of a w-vector are 


(oe) 


@) =n / eid? ad (q!), 
re Al 
! —4 a iq’p' /h i ae ( ) 
(¢|)=h? € dp (p'|). 


These formulas have an elementary significance. They show that either of 
the representatives is given, apart from numerical coefficients, by the amplitudes 
of the Fourier components of the other. 


uncertainty 
principle 


98 V. THE QUANTUM CONDITIONS 


It is interesting to apply (41) to a ~-vector whose q-representative consists 
of what is called a wave packet. This is a function whose value is very small 
everywhere outside a certain domain, of width Aq’ say, and inside this domain is 
approximately periodic with a definite frequency.? If a Fourier analysis is made 
of such a wave packet, the amplitude of all the Fourier components will be small, 
except those in the neighbourhood of the definite frequency. The components 
whose amplitudes are not small will fill up a frequency? band whose width is of 
the order 1/Aq’, since two components whose frequencies differ by this amount, 
if in phase in the middle of the domain Aq’, will be just out of phase and interfering 
at the ends of this domain. Now in the first of equations (41) the variable p'/h 
plays the part of frequency. Thus with (q’|) of the form of a wave packet, 
the function (p’|), being composed of the amplitudes of the Fourier components of 
the wave packet, will be small everywhere in the p’-space outside a certain domain 
of width Ap’ = h/Aq’. 

Let us now apply the physical interpretation of the square of the modulus of 
the representative of a ~ as a probability. We find that our wave packet represents 
a state for which a measurement of qg is almost certain to lead to a result lying 
in a domain of width Aq’ and a measurement of p is almost certain to lead to 
a result lying in a domain of width Ap’. We may say that for this state q has 
a definite value with an error of order Aq’ and p has a definite value with an error 
of order Ap’. The product of these two errors is 


Ad Ap’ = h. (42) 


Thus the more accurately one of the variables q or p has a definite value, the less 
accurately the other has a definite value. In the limit when one of them is 
completely determined, the other is completely undetermined. This last result can 
be obtained more directly from the transformation function (q'|p’). According to 
the end of §22, |(q'|p’)|? dq’ is proportional to the probability of q having a value 
in the small range from q’ to q’ + dq’ for the state for which p certainly has 
the value p’, and from (40) this probability is independent of gq’ for a given dq. 
Thus if p certainly has a definite value p’, all values of g are equally probable. 
Similarly it may be shown that if q certainly has a definite value gq’, all values of p 
are equally probable. 

Equation (42) is known as Heisenberg’s Principle of Uncertainty. It shows 
clearly the limitations in the possibility of simultaneously assigning 
numerical values, for any particular state, to two non-commuting observables, 
when those observables are canonically conjugate variables, and provides a plain 
illustration of how observations in quantum mechanics may be incompatible. 
It also shows how classical mechanics, which assumes that numerical values can 


t Frequency here means reciprocal of wave-length. 


wave packet 


99 


be assigned simultaneously to all observables, may be a valid approximation when 
h can be considered as small enough to be negligible. Equation (42) holds only 
in the most favourable case, which occurs when the representative of the state is 
of the form of a wave packet. Other forms of representative would lead to a Aq’ 
and Ap’ whose product is larger than h. 

The foregoing work can be easily extended to systems with several degrees 
of freedom. The transformation function connecting the q- and p-representations 
when there are n degrees of freedom is, according to the law (11), just the product 
of the transformation functions for each degree of freedom separately, namely 


(4199 ---Qn|PLPo-+-Pr) = (41|P1) (Galo) « - - (Gn IPp) 
= pore api teapot +anPn)/h (43) 


The idea of a wave packet can be extended to the case of several q’s, 
the function (q’|) having to be very small everywhere outside a certain domain 
of the q’-space and approximately periodic in each of the q’’s inside this domain. 
The principle of uncertainty then applies to each degree of freedom separately. 


29. Displacement Operators 


An instructive way of looking at some of the quantum conditions is provided by 
a study of displacement operators. These appear in the theory when we take into 
consideration that the scheme of relations between states and observables given in 
Chapter II is essentially a physical scheme, so that if certain states and observables 
are connected by some relation, on our displacing them all in a definite way 
(for example, displacing them all through a distance dz in the direction of the x-axis 
of Cartesian coordinates), the new states and observables would have to be 
connected by the same relation. 

The displacement of a state or observable is a perfectly definite process 
physically. Thus to displace a state or observable through a distance dx in 
the direction of the x-axis, we should merely have to displace all the apparatus used 
in preparing the state, or all the apparatus required to measure the observable, 
through the distance 6x in the direction of the x-axis, and the displaced apparatus 
would define the displaced state or observable. A displaced state or observable 
is uniquely determined by the undisplaced state or observable together with 
the direction and magnitude of the displacement. 

The displacement of a q-vector is not such a definite thing though. If we take 
a certain w-vector, it will represent a certain state and we may displace this state 
and get a perfectly definite new state, but this new state will not determine our 
displaced w, but only the direction of our displaced w. We help to fix our displaced 
w by requiring that it shall have the same length as the undisplaced w, but even 


100 V. THE QUANTUM CONDITIONS 


then it is not completely determined, but can still be multiplied by an arbitrary 
phase factor. One would think at first sight that each ~ one displaces would 
have a different independent phase factor, but with the help of the following 
extra condition, we see that they must all have the same. We here make 
use of the law that superposition relationships between states remain invariant 
under the displacement. A superposition relationship between states is expressed 
mathematically by a linear equation between the w’s representing those states, 
for example 


Yo = citi + cope (44) 


where c, and cp are numbers, and the invariance of the superposition relationship 
requires that the displaced states can be represented by w’s with the same linear 
equation between them—in our example they could be represented by we, wi 
and wl satisfying 

ph = ent + coh. (45) 
We now take such w’s to be our displaced w’s, that is to say, we require that 
any linear equations holding between our undisplaced w’s shall hold also between 
our displaced w’s. This makes it impossible to provide our displaced w’s with 
independently variable phase factors, as these would spoil the linear equations 
[for example, (45) would cease to be valid if we multiplied ~}, u! and WI by different 
factors e’”, e’% and e], and the only arbitrariness left in the displaced w’s is 
that of a single arbitrary phase factor to be multiplied into them all. 

With the displacement of w’s made fairly definite in the above manner and 
the displacement of ¢’s, of course, made equally definite, through their being 
the conjugate imaginaries of the wW’s, we can now assert that any symbolic 
equation between w’s, @’s, and linear operators must remain invariant under 
the displacement of every symbol occurring in it, on account of such an equation 
having some physical significance which will not get changed by the displacement. 
Take, for example, the equation 


Pn = ¢, (46) 


op being any @-vector, w any w-vector, and c a number, equal to their 
scalar product. The assertion that this equation remains invariant under 
the displacement may be written, if we use the sign + generally to denote a displaced 
quantity, 

old! =c = dir, (47) 


and is thus equivalent to the assertion that the scalar product ;; is invariant. 
Now a scalar product @,~, may be regarded as a specification of the extent 
to which the two states represented by ¢, and w approximate to being orthogonal 
(it vanishes when they are orthogonal), and the assertion of its invariance 


29. Displacement Operators 101 


is justified on account of the notion of orthogonality of two states being 
a physical notion, unaffected by an equal displacement of both states. Again, 
an equation of the type 


Ewa = Wo, (48) 


€ being any observable, denotes some physical relation between the observable € 
and the two states represented by w, and y, although this relation cannot be 
described in an elementary way. This physical relation must be invariant under 
the displacement and hence equation (48) must be invariant. 

To deal mathematically with the invariance of equations like (46) and (48), it is 
convenient to introduce a process of differentiation, denoted by D,, defined by 


t 
(oS VI 
Det ~. am 0x ; 
Oho 49 
Dz Ox = Jim, by ’ ( ) 
_ 4 g-€é 
ao ae dx” 


the + denoting a quantity displaced through a distance dx in the direction of 
the x-axis. There will be some lack of determinacy in D,7; due to the arbitrary 
phase factor by which we may multiply all our displaced w’s. Taking new 
displaced ~’s equal to e’’ times the previous ones, we get a new D,wW, say D*qy, 
defined by 


| 
oH 
8H 
Ls 
a, 
s 
oo 
S| | 
S 
Q 
a 2 
| 
re 
ee) 


= Dz + tay, (50) 


where a is a real number and is the limit of y/dz. (We must choose y so 
that this limit exists in order that D* may have a meaning.) There will be 
a corresponding lack of determinacy in D,@,, but none in D,€. Applying our 
differentiation process D,, which is subject to the usual law for the differentiation 
of a product, to equations (46) and (48), we get 


(Drdx)di + oe(Dathi) = 0 (51) 
and (Dafa + €(Datha) = Dat (52) 


These equations must hold for each of the various meanings of D, arising from 
its lack of determinacy. 


102 V. THE QUANTUM CONDITIONS 


The condition that linear equations between the w’s remain invariant 
under the displacement and that an equation such as (45) holds whenever 
the corresponding (44) holds, means that the displaced w’s are linear functions of 
the undisplaced ~’s and that each displaced w is the result of some linear operator? 
applied to the corresponding undisplaced w. In symbols, 


oj = Adi, (53) 


where A is a linear operator independent of J and depending only on 
the displacement. It follows that D,y; must also be the result of some linear 
operator applied to ~,. We call this linear operator the displacement operator d,, 
thus 

Alternatively, we could define d, directly in terms of the operator A of 
equation (53), as 


d, = lim a. E 
6x30 O24 


(55) 


From (50) we see that the lack of determinacy in d, consists in the possibility of 
adding to it an arbitrary’ imaginary number. 
Let us see how to express D,@, and D,€ in terms of dz, d, and €. From (51) 
and (54) we get 
(Drbx)vit ordey = 0 


and since this holds for arbitrary yy, we must have 


This result shows that d, is an! imaginary operator (i times a Hermitian operator), 
since, D,.¢, being the conjugate imaginary of D,wW,, it gives —@;,d, as the conjugate 
imaginary of d,w,. From (52) and (54) we get 


(D6), a Ed Wa = da Wp = dpEWe 
from (48). Since this holds for arbitrary w., we must have 
Dt = dé — bd,. (57) 


We can see from this result how it is that the lack of determinacy in d,, consisting 
in the possibility of adding to it an arbitrary’ imaginary number, is not associated 
with any lack of determinacy in D,€. 


This follows at once (with the definition of a linear operator given in §8) from the invariance 
of the linear equation expressing an arbitrary w in terms of the basic w’s of a representation. 
t[‘pure’ omitted] 


29. Displacement Operators 103 


Let us now introduce a set of canonical coordinates and momenta consisting 
of x, y and z, the Cartesian coordinates of the centre of gravity of our system, 
and pz, py and p,, the components of the total momentum of the system, which are 
the conjugates of x, y and z, together with any other coordinates and momenta 
that may be necessary for describing internal degrees of freedom of the system. 
If we suppose a piece of apparatus, which has been set up to measure x, to be 
displaced a distance 6x in the direction of the x-axis, it will measure x — dx. Thus 


at = 2 — 6x 
and therefore, from the third of equations (49), 
Dix = —1. (58) 


From (57) we now find 
wd, —d,x = 1. 


This is the quantum condition connecting d, with 2. From — similar 
arguments we find that each of the other canonical coordinates and momenta 
introduced above, since it is unaffected by the displacement, must commute with 
d,. Comparing these results with (9), we see that thd, satisfies just the same 
quantum conditions as p,. Their difference, p, — ihd,, commutes with all the 
coordinates and momenta and must therefore be a number. This number, which 
is necessarily real since p, and zhd, are both Hermitian operators, may be made 
zero by a suitable choice of the arbitrary’ imaginary number that can be added to 
d,. We then have the result 

D5 = Ties (59) 


or the x-component of the total momentum of the system is th times 
the displacement operator dy. 

This is a fundamental result, which gives a new significance to displacement 
operators. There is a corresponding result, of course, also for the y and z 
displacement operators d, and d,, The quantum conditions which state that pz, 
py and p, commute with each other are now seen to be connected with the fact 
that displacements in different directions are commutable operations. 

We can build up a similar theory for rotation operators about the x, y and z 
axes. These linear operators, de, d, and d¢ say, are found, in the same way as d,, 
to bet imaginary and to be undetermined to the extent of arbitrary’ imaginary 
additive numbers. Their quantum conditions may be easily calculated and turn 
out to be, apart from the factor 7h, the same as those for the components of angular 
momentum of the system (as they will be deduced in §38), so that we can identify 
thde, thd, and thd, with the components of angular momentum. An interesting 


104 V. THE QUANTUM CONDITIONS 


consequence of this result is that if a state, represented by w, has zero angular 
momentum, then 


dep = dnp = dep = 0, 
which requires that w shall be spherically symmetrical. Thus a state of zero angular 
momentum is necessarily spherically symmetrical. 


30. Contact Transformations 


Let U be any linear operator that has a reciprocal U~! and consider the equation 
a* =UaU, (60) 


q@ being an arbitrary linear operator. This equation may be regarded as expressing 
a transformation from any linear operator a to a corresponding linear operator a’, 
and as such it has rather remarkable properties. In the first place it should be 
noted that each a* has the same eigenvalues as the corresponding a; since, if a’ is 
any eigenvalue of a and w, is the eigen-~ belonging to it, we have 


AV = Pa! 
and hence Ye Udy =Ua Ube = Vay = aU, 


showing that Uw. is an eigen-~) of a* belonging to the same eigenvalue a’, 
and similarly any eigenvalue of a* may be shown to be also an eigenvalue 
of a. Further, if we take several a’s that are connected by algebraic equations 
and transform them all according to (60), the corresponding a*’s will be 
connected by the same algebraic equations. This result follows from the fact 
that the fundamental algebraic processes of addition and multiplication are left 
invariant by the transformation (60), as is shown by the following equations: 


(a; +a)" =U (ei +on)U* SUG U + 0a * = at +h, 


(a1Q2)* = UayagU~* = UayU"'UagU = afas. 


Let us now see what condition would be imposed on U by the requirement that 
any Hermitian a shall be transformed into a Hermitian a*. Equation (60) may 
be written 

a*U = Ua. (61) 
Taking the conjugate complex of both sides in accordance with (19) of Chapter II 
we find, if a and a* are both Hermitian, 


Ua’ =al. (62) 
Equation (61) gives us UatU = Ua 


30. Contact Transformations 105 


and equation (62) gives us UarU = aU. 
Hence UUa = avuu. 


Thus UU commutes with any Hermitian operator and therefore also with any linear 
operator whatever, since any linear operator can be expressed as one Hermitian 
operator plus 7 times another. It follows that UU is a number. By taking a matrix 
representation we can easily see that this number must be real and positive. We can 
suppose it to be unity without any loss of generality in the transformation (60). 
We then have 

P= i (63) 


Equation (63) is equivalent to any of the following 
U=U0, Usu ulotea1. (64) 


A matrix or linear operator U that satisfies (63) and (64) is said 
to be unitary and a transformation (60) with unitary U is called a unitary 
transformation. A unitary transformation transforms Hermitian operators into 
Hermitian operators. Also it transforms linear operators satisfying the expansion 
theorem into linear operators satisfying the expansion theorem, since, if a satisfies 
the expansion theorem, we can expand U~‘w, where is arbitrary, in terms of w,.’s 
and by multiplying this result by U, we get w expanded in terms of w-vectors of 
the form Uw,, each of which is an eigen-w of a*. We can now see that a unitary 
transformation transforms observables into observables. It leaves invariant any 
algebraic equation between the observables and also, as may easily be verified, 
any functional relation based on the general definition of a function given in §11. 

The inverse of a unitary transformation is also a unitary transformation, 
owing to the fact, which follows from (64), that if U is unitary, U~! is also unitary. 
Further, if two unitary transformations are applied in succession, the result is 
a third unitary transformation, as may be verified in the following way. Let the two 
unitary transformations be (60) and 


al =VartV} 
The connexion between a‘ and a is then 


at = VUaU'v-1 
= (VU)a(VU)* (65) 
from (32) of Chapter Il. Now VU is unitary since 


VUVE =UVVU Huu S11, 


and hence (65) is a unitary transformation. 


unitary 


106 V. THE QUANTUM CONDITIONS 


A transformation from one set of canonical coordinates and momenta q,. or p; 
to another set g* or p* [respectively] is called in quantum mechanics, as in classical 
mechanics, a contact transformation. In quantum mechanics the conditions for contact 
a set of variables to be canonical are algebraic, namely equations (9), which makes transformati 
the theory of contact transformations more elementary than in classical mechanics. 
We shall now see that quantum contact transformations are the same as the above 
unitary transformations. 

Let us consider the contact transformation from the canonical variables q,., p, to 
the canonical variables g*, p*. We shall use two representations in which the q’s and 
the q*’s respectively are diagonal, the phase factors of these representations being 
such that equation (34) and the corresponding equation for the starred variables 
hold. We introduce the linear operator U whose mixed representative (q*’|U|q") is 
defined to be 

(q"'|U|q") == d(q"’ = q'). (66) 

[The right-hand side of (66) has a meaning since each gq’ and q*’ takes on all 
values from —co to co.] We note in the first place that U is unitary, since, 
using fundamental equations of the transformation theory,’ 


(q‘|U|q") = ‘i (q'|qr") dg” (gr |U|q") = (‘la*"), 


so that (q"|U|q') = (dla*") = (9°), 


and hence 
(q|UU |q’”) = [owe dq" (q"|U|q") 


= i (q/lq"") ag ("dq") 


= 5-4"). 
so that UC =, 
We have further (q*’|qU|q") = -'6(q"' — q") 
and (q*""|Uar|q") = 5(q" — @" Jar. 


The right-hand sides of these two equations are equal on account of (9) of 
Chapter IV and hence 


q,U = UG, 


In this piece of analysis we use the notation that a gq, and a q; with the same number of 
primes both denote the same number. Thus, for example, ¢’ = g*”. It is necessary to retain 


both symbols for the same number in order to preserve the meaning of bracket expressions, 
such as (q'|q*”). 


30. Contact Transformations 107 


or gH 0a,0-% 


Again, according to the rules expressed by (35), which are valid also for mixed 
representatives 


I | ok W\ __ : @) x! " 
(q"|p,U|q") = te ad q') 
*/ VW . 0 x*/ / 
(q'|Up,|q’) = tha O( — @"). 


au 
The right-hand sides of these two equations are obviously equal, and hence 


p,U = Up, 
or pS Up 


This establishes that a contact transformation is just a unitary transformation 
of the form (60). The converse result, that a unitary transformation applied to 
a set of canonical variables gives a contact transformation, is obvious, owing to 
the invariance of algebraic relations under a unitary transformation. We can 
now give a meaning to contact transformations for dynamical systems in which 
canonical coordinates and momenta do not exist, defining such transformations 
simply as unitary transformations. 

One of the ways of expressing the conditions for a contact transformation in 
classical mechanics is 

OS Os 


r ; == ; 67 
anes Des (67) 


S being some function of the q’s and q*’s. There is a quantum analogue of this. 
We define the quantum S' by 


2a 


Gg json (68) 


We now have 


(d'lp.la"”) = / (a! pel) aq" alg”) 


ee a ee ae) 


r 


(d/|q""). (69) 
Similarly, 


(q'|pe|qr”) = fw" dqr” (q*'" |p, |q*”) 


ase as(d, ge” 
(lq) =—- a4 ) 
q 


r 


0 
sh 
; Ogi" 


r 


(q'\q*"). (70) 


well-ordered 
function 


108 V. THE QUANTUM CONDITIONS 


From equation (50) of Chapter III we have 


WI fda") = fale" )(d|lar"), (71) 


where f(q) and g(q*) are functions of the q’s and q*’s respectively. Let B(q, q*) 
be any function of the q’s and g*’s consisting of a sum of terms each of the form 
f(a)g(q*), so that all the q’s in B occur to the left of all the q*’s. Such a function 
we call well-ordered. Applying (71) to each of the terms in B and adding, we get 


BAG )Ig") = BY, P(g"). (72) 
Now let us suppose each p, and p= can be expressed as a well-ordered function of 


the q’s and q*’s and write these functions p,(q, q*), p*(¢, q*). We shall then have, 
from (72), 


(q'|prlan") = pela,’ )(q'|a""), (73) 
(d'Iprla"”) = pid ")(d‘la"") (74) 
Comparing (73) with (69) and (74) with (70), we see that 
aS(q,q"") aS(q,q*") 
/ «I _ 5) * / */ 25,23 5) 
prl(q, q ) = Od. ’ p,( ’ q ) Oge" - 
This means that as(a.a") as(a.¢") 
q,4 q,4 
oe . ica. CONG 75 
p Ba, Pr a: (75) 


provided the right-hand sides of (75) are written as well-ordered functions. 
Thus the classical equations (67) for a contact transformation hold also in 
the quantum theory when the non-commuting variables g and q* in their right-hand 
sides are suitably ordered. 

We get an infinitesimal contact or unitary transformation by taking U in (60) 
to differ by an infinitesimal from unity. Put 


U=1+ieF, 
where € is infinitesimal, so that its square can be neglected. Then 
U* =1-ieF. 


The unitary condition (63) or (64) requires that F shall be Hermitian. 
The transformation equation (60) now takes the form 


a* = (1+ieF )a(1 —ieF), 
which gives a* —a=ie(Fa—aF). (76) 
It may be written in P.B. notation 
a*—a=ehla, F), (77) 


when it is formally the same as a classical infinitesimal contact transformation. 


VI. THE EQUATIONS OF 
MOTION 


31. Schrodinger’s Form for the Equations of Motion 


Our work in Chapters II to V was all concerned with one instant of time. It 
gave the general scheme of relations between states and observations at that one 
instant of time. To get a complete theory of dynamics we must consider also the 
connexion between different instants of time and set up something of the nature 
of equations of motion. 

The state of our system at each instant of time will be represented by some 
vector w and we have to find the law of variation of w with the time t. For 
this purpose we use the general principle of superposition, according to which, 
as discussed in §6, any superposition relationship between states holding at one 
instant of time will hold throughout all time. Thus if, for example, we have three 
states at one instant of time, represented by three vectors Wo, w1, we satisfying 


Vo = C11 + Cote, 


these states will vary with the time in such a way that at any other instant of time 
they will be represented by three vectors, wy, wi, wi say, which satisfy, provided the 
arbitrary numerical factors by which these vectors may be multiplied are suitably 
chosen, 

wf = at as cops, 


with the same coefficients c,; and cy. This requires, as we had in §29 in connexion 
with equations (44) and (45) referring to a displacement, that each 7)! shall be the 
result of some linear operator applied to the corresponding w. If we now take the 
second instant of time, to which ~ belongs, to differ by only a small time interval 
ot from the first and form the differential coefficient 

ap vlad 


Hao. Ob 


then dz/dt must also be the result of some linear operator applied to the 
corresponding ~. 


109 


Schrédinger’s wave 
equation 


wave equation 


wave function 


110 VI. THE EQUATIONS OF MOTION 


We put hb 

are Hy. (1) 

where A is a linear operator independent of w. This gives the general law for the 

variation of w-vectors with the time. We make the further assumption that H is a 

Hermitian operator. This has the effect of making any scalar product of a @ with 
aw constant, since it causes the conjugate imaginary of (1) to be 


dp 
—ih—, = oH, (2) 
so that 
in (ua) = ney ing, = —(GeH v1 + Oe(H yr) = 0. 


By arguments similar to those used for d, in §29, we can deduce that H is 
undetermined to the extent of an arbitrary real’ number. 

Formula (1) shows how all the states of our system vary with the time and is 
one of the fundamental ways of expressing the equations of motion of quantum 
mechanics. Written in terms of representatives in a representation in which, say, 
each of the complete set of commuting observables q is diagonal, it appears as? 


inal) = | (aletld") da" ap. (3 


In this form it is known as Schrédinger’s wave equation, having been first put 
forward by Erwin Schrédinger in 1926, and is very extensively used in practical 
applications of the theory. Its solutions are called wave functions, owing to the fact 
that in a great many problems they are of the kind of function which represents 
waves; in fact, as we shall see in §34, they are so, if the q’s are taken to be 
dynamical coordinates, in all those problems in which the classical theory holds 
as an approximation. The square of the modulus of a normalized solution gives 
the probability of the q’s having specified values at any time for some particular 
state of motion of the system. Formula (2) written in terms of representatives 
gives the conjugate complex equation to (3), namely 


~in (la) = f(a) aa" a" Ha), (4) 


which is equally fundamental in general theory but is not so often explicitly used 
in practice. 


t[‘additive’ omitted.] 
?The case of continuous q/’’s is taken for definiteness, the usual modifications of notation 
being required for the discrete case. 


31. Schrédinger’s Form for the Equations of Motion 111 


The linear operator H introduced in (1) we call the Hamiltonian of the system. 
There is one such linear operator for each dynamical system. We assume it 
to be always an observable and to be, in fact, the total energy of the system. 
Its analogy with the Hamiltonian of classical mechanics will become apparent in 
the next section. Like the classical Hamiltonian it may either be constant or vary 
with the time, one or other of these possibilities occurring according to whether 
there are present only forces of interaction between the various component parts 
of the system or whether there are also external forces present. The constancy or 
variability with time of the linear operator H implies, of course, the constancy or 
variability with time of its representative (q'|H|q’). 

When H is constant we can write down a formal solution of (1), namely 


ay = eH ah (5) 


Wo being the value of any ~ at time 0 and y its value at time t. This solution may 
be verified by direct substitution in (1), it being noted that the differentiation 
of the exponential can be carried out in the ordinary way since there are 
no non-commuting quantities involved. In practical problems the solution (5) 
is not often of use, owing to the difficulty of evaluating the exponential, and one 
usually has to work from the differential equation (3) instead. 

It may happen that a particular state of our system does not vary with the 
time. It is then called a stationary state. The condition for a state to be stationary 
is that it shall be represented by a ~ whose direction remains constant, i.e. 


dw 
—=} 6 
1 = ny, (6) 
where A is a number. Combining this equation with (1) we get 
Hw = ih), 


which is just the condition that ~ shall be an eigen-w of H. Thus the stationary 
states are the eigenstates of the Hamiltonian. It is necessary that equation (6) 
shall hold throughout all time and hence w must be an eigen-w of H throughout 
all time. This is usually possible only when H is constant, so that stationary states 
usually exist only for a dynamical system with constant Hamiltonian. There are 
then so many of them that an arbitrary state is dependent on them (from our 
assumption that the Hamiltonian is an observable). For each of these stationary 
states the Hamiltonian or energy has a definite value, namely the eigenvalue H’ 
to which the state belongs, equal to 7h times the » of equation (6), and the w 
representing the state varies with time according to the law 


Wy = Ea, (7) 


i.e. the simple harmonic law, with a frequency depending only on the associated 
energy value. 


Hamiltonian 


stationary state 


Schrédinger 
picture 


Heisenberg picture 


112 VI. THE EQUATIONS OF MOTION 


32. Heisenberg’s Form for the Equations of Motion 


In the preceding section we had a picture of the states of our dynamical system 
represented by vectors in a certain vector space, these vectors varying with time 
in order to correspond to the changes taking place in the states. We shall call 
this the Schrodinger picture. On account of the linear form of the law of variation 
of the vectors with time, as shown by equation (1), we may adopt an alternative 
picture in which the vectors representing the states are all fixed, but are referred to 
a moving system of coordinates. We shall call this the Heisenberg picture. The two 
pictures are, of course, formally equivalent. In both of them the coordinates of 
a w representing a state vary in the same way, namely according to (3), the only 
difference being that in one of them this variation is ascribed to a motion of the w’s 
themselves and in the other it is ascribed to a motion of the system of coordinates. 

In the Schrédinger picture a dynamical variable is represented by a constant 
linear operator. In the Heisenberg picture a dynamical variable will be represented 
by a linear operator fixed relative to the coordinate system and _ therefore, 
in general, varying with time. Let us determine its law of variation. 

A vector w fixed relative to the coordinate system in the Heisenberg picture 
must vary with time according to the formula 

dp 
ih = Ay, (8) 

that is, formula (1) with a minus sign, since this is the time-variation which must be 
superposed on (1) to bring ~ to rest. The H in (8) is at any time the same function 
of the dynamical variables as the H in (1), though these dynamical variables are 
now represented by moving linear operators. The condition for a linear operator 
€ to be fixed relative to the coordinate system is that, when multiplied into any 
vector w, fixed relative to the coordinate system, the resulting w-vector 


EWa a Wo (9) 
shall also be fixed relative to the coordinate system. Differentiating (9), we get 
dé f dq ae dup 
ae tet sae ae? 
and with the help formula (8) applied to ~, and wy we find 


in, — EH = —Hy = HE. 


Since this holds for arbitrary w, we can cancel out w,, obtaining 


dé 
ih = €H — HE. (10) 


32. Heisenberg’s Form for the Equations of Motion 113 


Equation (10) gives the law of variation of dynamical variables with time in 
Heisenberg’s picture and is Heisenberg’s form for the equations of motion. It is 
comparable with the classical equations of motion, since these are also concerned 
with the variation of dynamical variables and not, like Schrédinger’s form for 
the quantum equations of motion, with the variation of states. The classical 
equations of motion are 


dq, OH dp, OH 
= —_ int 
dt = Op,’ dt Oqr’ (11) 


FH being the classical Hamiltonian and the q’s and p’s a set of canonical coordinates 
and momenta. They give, for € any function of the q’s and p’s that does not contain 
the time t explicitly, 


dg -y- OE dq | OF dp, 
dt 4+ | Aq, dt © Op, dt 
_ 0€ OH dE OH 
7 i Odr OPr Opry Or 


= |é,44), (12) 


with the classical definition of a P.B., equation (1) of Chapter V. But equation (10) 
takes precisely the form (12) with the quantum definition of a P.B., equation (7) of 
Chapter V. We thus get an analogy between the classical and quantum equations 
of motion, on the basis of the analogy between classical and quantum P.B.’s, 
discussed in Chapter V, and we also get a justification for calling the linear operator 
H introduced by equation (1) the Hamiltonian of the quantum-mechanical system. 

Our general derivation of equation (10) shows that the equations of motion of 
any dynamical system in quantum mechanics are determined by a Hamiltonian, 
whether the system is one that has a classical analogue and is describable in terms 
of canonical coordinates and momenta or not. A system is defined mathematically 
by its Hamiltonian being given. When the system does have a classical analogue, 
it is usually permissible to assume that the Hamiltonian is the same function 
of the dynamical variables as in the analogous classical system! There would be 
a difficulty in this, of course, if the classical Hamiltonian involved a product of 
factors whose quantum analogues do not commute, as one would not know in 
which order to put these factors in the quantum Hamiltonian, but this does not 
happen for most of the elementary dynamical systems whose study is important for 
atomic physics. In consequence we are able also largely to use the same language 


?This assumption is found in practice to be successful only when applied with the dynamical 
coordinates and momenta referring to a Cartesian system of axes and not to more general 
curvilinear coordinates. 


constant 
motion 


of 


the 


114 VI. THE EQUATIONS OF MOTION 


for describing dynamical systems in the quantum theory as in the classical theory 
(e.g. to talk about particles with given masses moving through given fields of force), 
and when given a system in classical mechanics, can usually give a meaning to 
‘the same’ system in quantum mechanics. 

A system in quantum mechanics is usually defined by its Hamiltonian being 
given as an algebraic function of dynamical variables, the nature of these dynamical 
variables being defined by their quantum conditions. ‘This does not include 
the most general systems, however. It is possible to have a system whose 
Hamiltonian is not expressible algebraically in terms of dynamical variables, 
but can be specified only through its representative in some representation 
being given. An example of such a system is provided by the interaction of a photon 
with an atom, as will be dealt with in Chapter XI. 

The equation of motion (12) must be generalized when € involves 
the time ¢ explicitly as well as being a function of the dynamical variables. 
The generalization is, of course, 


dg 


uo a + oH, 


in the quantum theory as well as in the classical theory. The generalization of (10) 


is thus dé ae 
qo a ee (13) 
A function of the dynamical variables not involving the time explicitly is, 
according to (10), a constant if it commutes with H. It is then called a constant 
of the motion. It must commute with H at all times, which is possible usually 
only if H is a constant. The constancy of H in our present Heisenberg picture 
requires, according to (13) applied with € = H, that OH/Ot = 0, or that 
H is a function’ of the dynamical variables not involving the time explicitly, 
and therefore is a constant also in the Schrédinger picture. The result that H 
is a constant of the motion if OH /Ot = 0 is a formal expression of the law of 
the conservation of energy for a system in which there are no external forces. 
The corresponding formal expression of conservation of momentum follows from 
the requirement that the Hamiltonian of a system with no external forces must 
be an observable that is unchanged by a displacement of the type considered 
in §29 and must therefore, according to equation (57) of that section with € = H, 
commute with the displacement operator, i.e. according to (59) of that section, 
with the total momentum. Conservation of angular momentum may be deduced 
in a similar way, for a system whose Hamiltonian is symmetrical, with the help of 
the rotation operators of §29. 


ih 


In a generalized sense, not necessarily an algebraic function. 


32. Heisenberg’s Form for the Equations of Motion 115 


We can conveniently work with a fixed representation in the Heisenberg 
picture only for dynamical systems whose Hamiltonian is constant. We then 
take the Hamiltonian itself to be diagonal. A representation of this type 
we call a Heisenberg representation, as it was introduced by Werner Heisenberg 
in 1925. It was historically the first form of quantum mechanics to be discovered. 
In a Heisenberg representation every diagonal matrix represents a function of 
the dynamical variables that commutes with the Hamiltonian and is therefore 
a constant of the motion. The problem of setting up a Heisenberg representation 
thus reduces to the problem of finding a complete set of commuting observables, 
each of which is a constant of the motion, and then making these observables 
diagonal. The Hamiltonian itself may be one of these observables. Each of the basic 
states of the representation is an eigenstate of H and is therefore, according to 
a result of the preceding section, a stationary state. 

Take a Heisenberg representation with the complete set of commuting 
observables a, each of which is a constant of the motion, diagonal. From a theorem 
on page 55 the Hamiltonian H, being diagonal, must be a function of the a’s, 
say H(a). Thus, taking for definiteness the case of discrete eigenvalues for the a’s, 
we shall have for the representative of H, from formula (30) of page 55 


(a"| Ha") = bata", (14) 


where H’' is short for H(a’). If now € denotes any dynamical variable, or any 
function of the dynamical variables not involving the time explicitly, we obtain, 
expressing (10) in terms of representatives, 


ih (« . a") = (a’|E|a”) H” = H'(a!|€|a”) 
4 ! " 1 " ’ 7 
at the (a'|éla") = —(H’ — A) (a'|éla"). 
Hence (a'|E|a") = (a'|gla"")pe PP ” (15) 


where (a’|€|a”)o is independent of t. Formula (15) shows how the matrix elements 
representing any dynamical variable in a Heisenberg representation vary with 
the time. The variation is simply periodic with the frequency 


|H’ — H"| /2nh =|H! — H"| /h, (16) 


depending only on the energy difference of the two stationary states to which 
the matrix element refers. This result contains the essence of the Combination 
Law of Spectroscopy and of Bohr’s Frequency Condition, according to which (16) 
is the frequency of the electromagnetic radiation emitted or absorbed when 
the system makes a transition under the influence of the radiation between 


Heisenberg 
representation 


Bohr’s 
condition 


frequency 


116 VI. THE EQUATIONS OF MOTION 


the stationary states a’ and a”, the eigenvalues of H being Bohr’s energy levels. 
These matters will be dealt with in 848. 

The above representation with the constants of the motion a@ diagonal is fixed 
in the Heisenberg picture, and is thus moving in the Schrédinger picture. We could 
introduce a representation with the a’s diagonal, which is fixed in the Schrédinger 
picture and is thus moving in the Heisenberg picture. The two representations 
would differ only in the phase factors. The representative, (a|f|a”)* say, 
of a dynamical variable € in the latter representation would not vary with the time 
and would thus, according to (15), be connected with the representative (a’|€|a”) 
in the former representation by the law 


(a’|E|a’”)* = (aia er EP 


with neglect of a possible constant phase factor. Hence the representative (a’|)* 
of a w in the latter representation would be connected with its representative (a’]) 
in the former by 

(a'|)* = (o'er (17) 


33. The Action Principle 


The analogy between Heisenberg’s form for the equations of motion (10) and 
the classical equation of motion (12) enables us to pursue the analogy between 
classical dynamics and quantum dynamics further and to see how all the main 
principles and results of the classical theory reappear in the quantum theory in 
a generalized form. 

If we denote by & the dynamical variable € at time ¢, then equation (10) 
gives us, for dt infinitesimal, 


h(Er+se — &) = Ot(€H — HE), 
or Er45e — & = 0(6t/h) (HE — EH). 


Comparing this with (76) of Chapter V, we see that the dynamical variables at 
time t + dt are connected with the dynamical variables at time t by an infinitesimal 
contact transformation. Thus the changing of the dynamical variables under 
the equation of motion (10) may be regarded as the continual development 
of a contact transformation. After the lapse of a finite time the dynamical 
variables will be connected with the initial dynamical variables by a finite contact 
transformation. These results are formally the same as in classical mechanics. 
One might expect them in quantum mechanics simply from the requirement 
that the quantum conditions must hold throughout all time, the only general 


33. The Action Principle 117 


transformations which leave invariant quantum conditions, or any algebraic 
equations, being the contact transformations of §30. 

If the Hamiltonian is a constant, the contact transformation connecting 
the dynamical variables at time t, &, with the dynamical variables £0, 
may be written 

& = ene een, (18) 


To verify this equation, we note that it obviously holds for t = 0, and when 
differentiated with respect to t gives 


dé, detit/h 


; : detHt/h 
ro foe ele ze grtine. 


dt dt dt? 
or in _ Hee a ei ge Un 
= —H&, + &H, 


which is just the equation of motion (10). Equation (18) thus provides an explicit 
solution in symbolic form of the differential equation (10). This solution, 
like equation (5), is not often useful in practice, owing to the difficulty of evaluating 
the exponentials. 

In the Heisenberg picture in which the states are represented by fixed w-vectors 
and the dynamical variables by varying linear operators, we may introduce a fixed 
representation in which the diagonal observables are dynamical variables at some 
definite time t. They may, for instance, be the coordinates at time t, q@ say, 
assuming the system to have canonical coordinates and momenta. We should 
then have one representation for each time t and should have a transformation 
function (qq) connecting the representations referring to two different times t 
and T. The law of transformation for the representative of a q-symbol will be 


(g)) = / (aila) dap (al). 


If in this equation we vary t keeping 7 and the function (q,|) fixed, 
the resulting (q;|) will give us the representative at various times of a fixed 
referred to the moving axes of the Heisenberg picture. This must be the same 
as the representative of a moving w, representing a state as it varies with time, 
referred to the fixed axes of the Schrédinger picture, and must therefore satisfy 
Schrédinger’s wave equation (3), i.e. 


4 AV ee ’ " WoW rr 
a [talar) dar (4r|) = [filed aa (a l¢r) dar (Gr). 


118 VI. THE EQUATIONS OF MOTION 


This holds for an arbitrary function (q/,|) and hence 


. d /| 7 / MW MW MW) 
in (alle) = f (ollHlat) dat (af) (19) 


Thus the transformation function (qi|q;), considered as a function of 
the variables q,, is a solution of Schrédinger’s wave equation. Similarly, considered 
as a function of the variables qp, it satisfies an equation of the form (4), namely 


° d ‘| 7 /| MW MW " / 
~in- (ald) = f Collet) dat (oH) (20) 


From the analogy between classical and quantum contact transformations 
discussed in §30, we see that (q/|q-) corresponds in the classical theory to e’*/", 
where S is Hamilton’s principal function for the time interval T to t, equal to 
the time-integral of the Lagrangian L, 


t 
ee | Ldt. (21) 
T 
Taking an infinitesimal time interval t to t+ dt, we see that (q;,5,|q;) corresponds 
to e'/8/h This result gives probably the most fundamental quantum analogue 
for the classical Lagrangian function. It is preferable for the sake of the analogy 
to consider the classical Lagrangian as a function of the coordinates at time ¢ and 
the coordinates at time t+0t, instead of a function of the coordinates and velocities 
at time t. 

There is an important action principle in classical mechanics concerning 
Hamilton’s principal function (21). It says that this function remains stationary for 
small variations of the trajectory of the system which do not alter the end points, 
i.e. for small variations of the q’s at all intermediate times between T and t with gr 
and q fixed. Let us see what it corresponds to in the quantum theory. 

Put 


exp {! a L arn} = exp{i5(t,ta)/h} = B(ts, ta), (22) 


so that B(t,t,) corresponds to (q,|q@,,) in the quantum theory. Now suppose 
the time interval T’ — t to be divided up into a large number of small time intervals 


T — t1, ty > ta,..., tm-1 2 tm, tm > t, by the introduction of a sequence of 
intermediate times 1, to,..., tm. Then 
BU, T) = Bet, by bates) tee B(ta, t,) Blt, T). (23) 


The corresponding quantum equation, which follows from the composition law (43) 
of Chapter ITI, is 


(alar) = ff fedld Adin (GnlGm—1) €4m—1 ++ (ali) dai (miler), (24) 


119 


q, being written for q, for brevity. At first sight there does not seem to be 
any close correspondence between (23) and (24). We must, however, analyse 
the meaning of (23) rather more carefully. We must regard each factor B as 
a function of the q’s at the two ends of the time interval to which it refers. 
This makes the right-hand side of (23) a function, not only of q and qr, but also 
of all the intermediate q’s. Equation (23) is valid only when we substitute for 
the intermediate q’s in its right-hand side their values for the real trajectory, 
small variations in which values leave S(t,7’) stationary and therefore also, 
from (22), leave B(t,T) stationary. It is the process of substituting these values 
for the intermediate q’s which corresponds to the integrations over all values for 
the intermediate q’’s in (24). The quantum analogue of Hamilton’s action principle 
is thus absorbed in the composition law (24) and the classical requirement that 
the values of the intermediate q’s shall make S(t,7) stationary corresponds to 
the condition in quantum mechanics that all values of the intermediate q’’s are 
important in proportion to their contribution to the integral in (24). 

Let us see how (23) can be a limiting case of (24) for h small. We must 
suppose the integrand in (24) to be of the form e”/", where F is a function of qd, 
Gis Ghs- +++ Uns G Which remains continuous as h tends to zero, so that the integrand 
is a rapidly oscillating function when h is small. The integral of such a rapidly 
oscillating function will be extremely small, except for the contribution arising from 
a region in the domain of integration where comparatively large variations in the qj, 
produce only very small variations in F. Such a region must be the neighbourhood 
of a point where F is stationary for small variations of the qg,. Thus the integral 
in (24) is determined essentially by the value of the integrand at a point where 
the integrand is stationary for small variations of the intermediate q/’’s, and so (24) 
goes over into (23). 


34. The Motion of Wave Packets 


The comparison between classical and quantum mechanics may be discussed 
with reference to a wave function, (q’|) or (q|), instead of, as we did above, 
with reference to a transformation function (q@|q;). The transformation 
function (q@|qp) is like a wave function in its dependence on the variables qj, 
as is shown by equation (19), and if we are interested only in the variables ¢ 
and not in q>, the natural thing to do is to study a wave function instead 
of the transformation function. The resulting simplification will enable us 
to push the comparison to a higher degree of accuracy without getting 
laborious calculations. 

Let us take a quantum dynamical system having a classical analogue and 
therefore describable with canonical coordinates and momenta and assume that 
its Hamiltonian is a function of the coordinates and momenta expressible as 


120 VI. THE EQUATIONS OF MOTION 


a power series in the momenta. The Hamiltonian is thus expressible as a sum 
of terms, each of which is a product of various powers of the momenta and 
of various functions of the coordinates, with no restriction on the order of 
the factors. To facilitate comparison with the classical theory we shall suppose 
that these functions of the coordinates are all real and that the Hamiltonian does 
not involve z in any way. This condition does not mean any loss of generality in 
our dynamical system, since if it does not hold we can make it hold by simplifying 
the expression for the Hamiltonian in the following way. We can certainly express 
the Hamiltonian as 

H = H, +iffa, (25) 


in which H, and Hp» involve the coordinates only through real functions and do 
not involve 7. H, and 7H individually need not be Hermitian, although, of course, 
H must be. Thus, taking the conjugate complex of equation (25), we get 


H =H, -ifa. (26) 


According to the rules of §15 for obtaining conjugate complexes, H; and H2 will be 
just Hy, and Hp» with the factors in all their terms in the reverse order, since each 
factor by itself is Hermitian. From (25) and (26) we have 


A = 3(H, + My) + 3i(He — Hp). (27) 


For each term in H» there will be a corresponding term in AH consisting of the same 
factors in the reverse order and the difference of two such terms can be reduced, 
by means of general theorems on P.B.’s given in §25 and of equation (38) in §27, 
to 7h times an expression not involving 7 in any way. By carrying out this reduction 
for all the terms in Hy — Hy in (27), we get H in the required form not involving i. 
It should be noted that in this form H remains unchanged if we reverse the order 
of the factors in every term. 

Since H is expressible as a power series in the momenta, in a representation 
in which the coordinates gq, are diagonal it will be represented by a differential 
operator of the form (36) of Chapter V, and thus Schrédinger’s wave equation (3) 
will read 


in (a) = ay, -ing) (q'). (28) 


Let us study the nature of the solution of (28) in the limiting case of h very small. 
We try to get a solution in the form of waves 


(q/|) =e?" A, (29) 


where S and A are real functions of the q’’s and t which give the phase and 
amplitude respectively. The appearance of the factor A here marks a step towards 
higher accuracy than we had in the preceding section. 


34. The Motion of Wave Packets 121 
With (29), the effect of the operator —ihO/Oqd/. on the wave function (q’|) is 


0 N\ _ piS/h Os . @) 
ag i= * (s ings) A cy 


and that of the operator ihd/dt is 


—th 


si a aii 108 30 
th (al) =e ( Bp + tha A. 


If f is any function of the operators —ihO/Oqd/. expressible as a power series, we find 
readily by repeated applications of (30) 


. 0 N\ _ WiS/h Os - 0 
r( ing) (al) =e (55 ing) A 


Thus when we substitute the expression (29) for (q’|) into (28) we shall get, 
after cancelling the factor e'/”, 


OSs. 220 pies 2200 


The operator on the right-hand side here is a power series’ in 
the (OS/0q' — ihO/Oq')’s and is thus a power series in the (hO/0q’)’s. We shall 
now neglect h? and thus neglect terms of higher degree than the first in this power 
series in the («hO/0q')’s. The terms of zero degree and of the first degree are real 
and* imaginary respectively, and therefore we shall have to equate the results of 
their operating on A to the real and? imaginary parts of the left-hand side of (31) 
respectively. 

Equating the real parts on both sides of (31) we get, after cancelling 


the factor A, 
Os OS 
—-—=AH\|7,—}. 
Ot (« ; 7) 2) 


This is just the Hamilton-Jacobi equation of classical mechanics, with S as 
Hamilton’s principal function, and is what we should expect from our work in 
the preceding section. 

Let us now pick out the terms of first degree in ihO/Od/. in the operator on 
the right-hand side of (31). These terms will give us an operator of the general form 


4.0 
d Xpibig 7 Ve (33) 


+[‘pure’ omitted. 


122 VI. THE EQUATIONS OF MOTION 


where the X’s and Y’s are functions of the q’’s. The total coefficient of ih0/Od. 
in (33), namely 5°, X;,Y;, must be equal to 


_  OH(7,08/0¢) 
2 X0%e = —aasjagy ee 


but we cannot immediately use this result on account of the sandwiched 
positions of ihO/Oq/. in (33). We must first use the condition mentioned above, 
that the expression for the Hamiltonian in coordinates and momenta remains 
unchanged if we reverse the order of the factors in every term. This means that 
the operator on the right-hand side of (31), and hence also the operator (33), 
will remain unchanged if we reverse the order of the factors in every term. Thus 


0 0 
X,ih—Y;, = Y,th— X, 
a eae 


1 
= 5 {Xun a Yi, + Yyih . xi} 
k 


2 Og. Og, 


1 BO sets 10 
= 9 d {Xiviih ote inge¥iXet 


se | CNG OS1OG) 0 x gp CEG OSCR). ies 
2\ o(dS/dq) dq. ° dq O(AS/Aq1) 


from (34). We must now equate the result of the operator (35) summed for all 
values of r, operating on A, to the! imaginary part of the left-hand side of (31). 
This gives 


Of 1 x OH(q,0S/0q) 0 | O OH(¢,0S/0q') Fi 
ot O(AS/dq,) Od. Aq’, A(OS/Aqr) 
which, on multiplication by 2A, reduces to 


OA? 7 0 (OH(¢,0S/07) ,» 
SO a pase 


r 


This is the equation for the amplitude A of the wave function. To get 
an understanding of its significance, let us suppose we have a fluid moving in 
the space of the variables q’, the density of the fluid at any point and time being A? 


and its velocity 
dq _ OH(q, 05/04’) (37) 
dt O(0S/dq.) 


t[‘pure’ omitted. ] 


123 


Equation (36) is then just the equation of conservation for such a fluid. There is 
one velocity function (37) for each function S satisfying (32). 

Let us take a solution of (36) for which at some definite time the density A? 
vanishes everywhere outside a certain small region. We may suppose this region 
to move with the fluid, its velocity at each point being given by (37), and then 
the equation of conservation (36) will require the density always to vanish outside 
the region. There is a limit to how small the region may be, imposed by 
the approximation we made above in neglecting h? in the operator in the right-hand 
side of (31). This approximation is valid only provided 


ed < oF 
Oq' Oq' 
which requires that A shall vary by an appreciable fraction of itself only through 
a range of q' in which S varies by many times A, i.e. a range consisting of many 
wave-lengths of the wave function (29). Our solution is then a wave packet of 
the type discussed in §28 and remains so for all time. 

We thus get a wave function representing a state? for which the coordinates and 
momenta have approximate numerical values throughout all time. Such a state 
in quantum theory corresponds to the states with which classical theory deals. 
The motion of our wave packet is given by equation (37) and is therefore, 
from the Hamilton-Jacobi theory of classical mechanics in which the momenta p, 
are replaced by OS/0q,, just along the classical trajectory. This gives us 
a justification, of a less formal type than the analogy discussed in §32, 
for considering the classical equations of motion as the limiting form of 
the quantum ones when h — 0. 

By a more accurate solution of the wave equation one can show that 
the accuracy with which the coordinates and momenta simultaneously have 
numerical values cannot remain permanently as favourable as the limit allowed 
by Heisenberg’s principle of uncertainty, equation (42) of Chapter V, but if it is 
initially so it will become less favourable, the wave packet undergoing a spreading.4 


35. The Free Particle 


The most fundamental and elementary application of quantum mechanics is to 
the system consisting merely of a free particle, or particle not acted on by any 
forces. The problem is still very simple when we take into account, as we shall 


’The word ‘state’ is here used with its space-time meaning. 

WSee Kennard, E. H., ,,Zur Quantenmechanik einfacher Bewegungstypen“ Zeitschrift fur 
Physik (1927), 44(4-5), pp. 326-352, [doi: 10.1007/BF01391200 |; and Darwin, Charles Galton, 
“Free motion in the wave mechanics” Proceedings of the Royal Society of London A, 117 (1927), 
pp. 258-293 [doi: 10.1098 /rspa.1927.0179 | 


124 VI. THE EQUATIONS OF MOTION 


do here, the relativistic variation of the mass of the particle with its velocity. 
We shall use as dynamical variables the three Cartesian coordinates of the particle 
x, y, z, and their conjugate momenta pz, Py, pz. In terms of these variables, 
the Hamiltonian in classical mechanics, equal to the energy, is 

H = c(m?e +p? + p2 + p?)4 (38) 
where m is the rest-mass of the particle and c is the velocity of light. We assume 
the Hamiltonian to be of the same form in quantum mechanics, the square root 
now being interpreted as the positive square root defined at the end of §11. 

From the quantum conditions (9) of Chapter V, p, commutes with p, and pz, 
and hence, from the theorem given at the end of 816, p, commutes with any 
function of pz, py, and p, and therefore with H. It follows that p, is a constant 
of the motion. Similarly p, and p, are constants of the motion. These results are 
the same as in the classical theory. Again, the equation of motion for a coordinate, 
x say, is, according to (10), 

Tan ne NU a. 2 2 2\4 ee. 2 2 2\4 

ihe = ih =F esp epi.) Cn ep, py ,) 
The right-hand side here can be evaluated by means of formula (38) of Chapter V 
with the roles of coordinates and momenta interchanged, so that it reads of 


Of 


s— GsJf = — ih 5 39 
fds— sf = —4 Op. (39) 
f now being any function of the p’s. This gives 
) "De 
Po oa al 2, +P, EP) = = 
Similarly 6 By CD. 
? SF Se Se AO 
He : H ay) 


These equations of motion are of the same form as in the classical theory. 

Let us consider a state that is an eigenstate of the momenta, belonging to 
the eigenvalues p’,, pi, p,. This state must be an eigenstate of the Hamiltonian, 
belonging to the eigenvalue 


H =e? +p, +p, +0) (41) 


and must therefore be a stationary state. The possible values for H’ are all numbers 
from mc? to ox, as in the classical theory. In a representation with the coordinates 
x,y, 2 diagonal, the representative of our stationary state at any time t will be, 
from (39) of Chapter V, of the form 


i Fy tank ne tt A 
ee |= geile’ Pet y'py+2'P,)/h 


35. The Free Particle 125 


where a is independent of x’, y’, z’ but may depend on the time t. From (7) we see 
that a varies with t according to the simple harmonic law 


where do is a constant, and hence 
: / / Foi tat / 
(x'y'z'|) = aget (Pet +pyy' +p, 2’ —H t)/h (42) 


Formula (42) gives the wave function representing a state with definite 
momentum, for the problem of a free particle. It could of course have been obtained 
alternatively from a direct solution of Schrédinger’s wave equation (3). It is of 
the form of plane waves in space-time. The frequency of the waves is 


v = H'/h, (43) 


their wave-length is 

12 12 p2\h / 
P’ being the length of the vector (p/,, pi, p,), and their motion is in the direction 
specified by the vector (p’,, pi, p,) with the velocity 


u=dAv= H'/P’ =? /v, (45) 


v being the classical velocity of the particle corresponding to the momentum 
(p',,Py,P-)- Equations (43), (44), and (45) are easily seen to hold in all Lorentz 
frames of reference, the expression on the right-hand side of (42) being, in fact, 
relativistically invariant with p’,,p',,p, and H’ as the components of a 4-vector. 
These properties of relativistic invariance led Louis de Broglie, before the discovery 
of quantum mechanics, to postulate the existence of waves such as (43) associated 
with the motion of any particle. They are therefore known as de Broglie waves. 
In the limiting case when the rest-mass m is made to tend to zero, the classical 
velocity of the particle v becomes equal to c and hence, from (45), the wave velocity 
also becomes c. The waves then become identical with the light-waves associated 
with a photon, except for the fact that they contain no reference to the polarization 
and involve a complex exponential instead of sines and cosines. Formulas (43) 
and (44) are still valid, connecting the frequency of the light-waves with the energy 
of the photon and the wave-length of the light-waves with the momentum of 
the photon. 

For the state represented by (42), the probability of the particle being found 
in any specified small volume when an observation of its position is made is 
proportional to |(a’y’z'|)|? and is thus independent of the position of the volume. 
This provides an example of Heisenberg’s principle of uncertainty, the state being 


de Broglie waves 


126 VI. THE EQUATIONS OF MOTION 


one for which the momentum is accurately given and for which, in consequence, 
the position is completely unknown. Such a state is, of course, a limiting case 
which never occurs in practice. The states usually met with in practice are those 
represented by wave packets, which may be formed by superposing a number 
of waves of the type (42) belonging to slightly different values of (p/,,p%,, p',). 
The ordinary formula in hydrodynamics for the velocity of such a wave packet, 
i.e. the group velocity of the waves, is 


dv 
d(1/X) 
which gives, from (43) and (44) 
dH’ d ; or! 
Pi ee hee = ==) (46) 


This is just the classical velocity of the particle and confirms the general theory of 
the preceding section. 


36. The Harmonic Oscillator 


As another example of a simple system treated according to quantum mechanics, 
we may take the harmonic oscillator, neglecting relativistic variation of mass with 
velocity. We have as variables only one coordinate q and its conjugate momentum p 
and we take the Hamiltonian to be, as in the classical theory, 


1 2 2) 22 
H= 5 (p+ mag’), (47) 
m being the mass of the oscillating particle and w being 27 times the frequency. 
With this Hamiltonian it is easily verified that the equations of motion for q 
and p are 
g=p/m, p= —mw"g, (48) 
precisely as in the classical theory. 

We must now determine the eigenvalues of the Hamiltonian. ‘This could 
be done directly by solving the differential equation (37) of Chapter V. 
An alternative method, based on more primitive arguments, is as follows. We have 
from straightforward non-commutative algebra, with the help of the quantum 
condition (12) of Chapter V, 


(p + imwgq)(p — imwg) = p? + mw"? + imw(qp — pq) 


= pt mw” — mhw 
= 2mH — mhw, (49) 


36. The Harmonic Oscillator 127 


and similarly, 
(p — imwg)(p + imwq) = 2mH + mhw. (50) 
Hence 


(Q2mH — mhw)(p + imwgq) = (p + imwq)(p — imwgq)(p + imwg) 
= (p +imwq)(2mH + mhw). (51) 


We now introduce a Heisenberg representation in which H is diagonal. We shall 
assume that H by itself forms a complete set of commuting observables and 
its eigenvalues can therefore be used for labelling coordinates in the representation. 
The justification for this assumption is that it leads, as we shall see, 
without inconsistency to definite representatives for gq and p. Expressing (51) 
in terms of representatives, we obtain 


{2mH' — mhw}(H'|\p + imwq|H") = (A' |p + imwq|H"){2mH" + mhw} 
or {H' — H" — hw}(A'|p + imwq|H”) = 0. (52) 


This shows that all the matrix elements (H'|p + imwq|H") of the representative 
of p+ imwg vanish except those for which 


H!' — H" — hw =0. (53) 


Taking the conjugate complex of this result in accordance with (18) of Chapter III, 
we see that all the matrix elements (H”|p — imwq|H"') of the representative 
of p—imwgq vanish except those for which (53) holds. It follows that in 
the equation 


S—(H' |p + imwg|H")(H" |p — imwg|H’) = (H'|2mH — mhw|H"') 
H" 
= 2mH' — mhw = 2m{H' — shw} (54) 


which we obtain by expressing (49) in terms of representatives and taking 
a diagonal matrix element of each side, referring to an arbitrarily chosen 
eigenvalue H’, all the terms in the sum on the left-hand side vanish except 
(at most) the one for which H” = H' — hw, if H' — hw is an eigenvalue of H, 
and if it is not, then every term on the left-hand side of (54) vanishes without 
exception. In the first case H’—4hw is positive or zero, since (H’|p+imwg|H'—hw) 
and (H' — hw|p — imwq|H") are conjugate complex numbers, and in the second 
H’' — 4hw is certainly zero. We can therefore draw the conclusions that, if H’ is 
any eigenvalue of H, then H’ is positive and either H’ — hw is another eigenvalue 


angle variable 


action variable 


128 VI. THE EQUATIONS OF MOTION 


or H' = $hw. Similarly, by expressing (50) in terms of representatives and taking 
the diagonal matrix element of each side referring to H’, we can draw the conclusion 
that either H’ + hw is another eigenvalue or H’ = —4hw. The second alternative 
here is ruled out, since H’ must always be positive. It follows finally from all this 
that the only possible set of eigenvalues for H is the series 


hw, hw, hw, thw, ..., (55) 


extending to infinity. These are the energy levels for the simple harmonic oscillator. 
We can now easily obtain the representatives of g and p. Equation (54) 
reduces to 


(H’|\p + imwg|H' — hw)(H' — hw|p — imwq|H') = 2m{H' — shw}. 
The two factors on the left here are conjugate complex numbers and hence 
(H' |p + imwgq|H' — hw) = (2m)2{H' — Shwe, 
(H’ — hw|p — imwq|H') = (2m)2{H' — bhw Pe ™, 


where 7’ is some real number, which may be a function of H’. From (15) we see 
that (H’|p + imwq|H' — hw) must vary with t according to the law 


(H"|\p + imwq|H’ — hw) = const. e* 
and hence 7’ must vary with t according to the law 
y= wt+% 


where yj is a constant. We can make 7 zero by a suitable choice of the phase 
factors of our representation. We then have 
(H! |p + imeg|H' — hw) = (2m)A{H' — dhe, | (56) 


(H’ — hw|p — imwgq|H') = (2m)24 H! _ bw tet 


These formulas give all the non-vanishing matrix elements of the representatives 
of p+imwg and p—imwg, and thus of the representatives of p and q. 

In the classical treatment of periodic and multiply-periodic dynamical systems 
it is often convenient to make use of action and angle variables. We can introduce 
corresponding variables in the quantum theory. In our present problem of 
the harmonic oscillator we can define the action variable J by 


J = H/w—4th. (57) 


36. The Harmonic Oscillator 129 


It is a constant of the motion and its eigenvalues are integral multiples of h 
greater than or equal to zero. Thus its matrix representative in the Heisenberg 
representation is 


OO. QO “0. .-Q 
Orde 20s 20k” 20 
QO; 2h <0 
00 0 3h 0 
00 0 0 4h 


when the rows and columns are arranged in order of ascending energy-levels. 


To define the angle variable we introduce the two matrices angle variable 
02 OF 0° OQ OL Oe Oe 
0G O20 0: 50) a C0 
0 0°00 OF 20% 0 2 30 
0010 0 O40: Ge Gy oT 
Os Os0F 1. 0.20.0" 0s 


in which the non-vanishing elements are just to the left and just to the right 
of the principal diagonal respectively, and call the variables that they represent 
at time t=0, e” and e respectively. These two matrices, according 
to §15, represent conjugate complex dynamical variables, in agreement with 
what is implied by the notation of e” and e~”. This notation implies further, 
however, that the two matrices are the reciprocals of one another and this is 
not altogether true. The matrix representing the product e~“’e™ is, in fact, 
just the unit matrix, but that representing e“’e~’” differs from the unit matrix 
through having zero for its first diagonal element. Thus 


—iw 


e Mew =, eve ad. (58) 

The variables e”” & e~”, defined above through their matrix representatives, 
are the best quantum analogues that we can get to the exponentials of 7 and —i 
times the angle variable of the classical theory. They have many properties 
analogous to those of their classical counterparts and their only serious defect 
is that e’”e~™ is not precisely equal to unity. Thus, for example, we obtain at 
once from the matrices the relations 


Je =e" (Fh), 


: 59 
fe Se Ms — fh), OF 


130 VI. THE EQUATIONS OF MOTION 


which are equivalent to the classical relations 
je tlae™ [e™, J] = -ie™™. 


Equations (59), when compared with equation (17) of Chapter V with c = +1, 
are seen to be consistent with the view that J and w are a pair of canonically 
conjugate dynamical variables satisfying the quantum condition 


wJ — Jw = ih, 


although actually this relation is meaningless since we cannot define w itself but 
only e+’ Again, the dynamical variable e’” at an arbitrary time t must be 
represented by a matrix whose elements vary with t according to the Heisenberg 
law e(4'-H't/h Since all the matrix elements vanish except those referring to 
consecutive energy-levels for which H’ — H” = hw, every matrix element will vary 
with the time according to the law e“* This corresponds to the fact that in 
the classical theory w increases linearly with t at the rate w. 

The dynamical variables g and p can be expressed in terms of the action and 
angle variables. From (56) we see that 


p+ imwg = (2m)*{H — dhw}e™ 
= (2mw)? Jee 
p— imwg = (2m)2e~”’ {H — hw}? 
= (2mw)2e™ J2 
Thus 


q = (2mw)4{ i Fee + ie~™ BY. eo) 
We see from these equations that g and p, when expressed in terms of the action and 
angle variables, involve them only through the two combinations J2e” and e~*” J3, 
Further, all dynamical variables that we ordinarily have to deal with to obtain 
physical results are algebraic functions of gq and p and therefore, when expressed 
in terms of the action and angle variables, will involve them only through 
the two quantities Je and e~ J Now it is easily verified from the matrix 
representatives that these two quantities are respectively equal to 


Jie’ = (J +h) ! 


p = (4mw)*{ J2e™ + e™ J?} 


1 
and ew ye = (J+ he (61) 
and that their products in either order are 

(Je) (e~ J) = J 
(e~™ J4) (Jee) = (J + hte ™)(e(J + h)®) =I +h. 


36. The Harmonic Oscillator 131 


These results hold in spite of the inequality in (58). They show that when 
we are dealing with ordinary dynamical variables which are algebraic functions 
of g and p and which therefore involve the action and angle variables only through 
the two quantities J2e” and e~*”’.J2, we may count e” and e~™ as truly reciprocal 
quantities without getting into error. Thus we can freely use the action and 
angle variables in complete analogy with the classical theory without getting 
incorrect results. 
The wave equation for the harmonic oscillator with Hamiltonian (47) is 


d 1 Oo? 

ih—(q' ee, —hfi2 m2 2/2 ly, 

ld = ape {Ra tate ba) 

The wave functions representing stationary states are those periodic solutions of 
this equation, for which the operator ihd/dt is the same as multiplication by 
an energy eigenvalue H’ and therefore satisfy 


1 Oo 
AC) = 5 {Ma + mag? (q'|). (62) 
The general solution of this equation has been given by Erwin Schrédinger.? 
It provides us with the transformation function (q'|H’) connecting the q- 
and H-representations, one of which, it may be noted, has a discrete set of basic 
states while the other has a continuous range. 

We shall here obtain some of the solutions representing states of lowest energy. 
Equation (62) reduces to 


Oe 0G WEL cs 
(qa- at fap =o (63) 


a2 


where a? is the number h/mw and H’' has been put equal to (n + 4)hw, n being 
a positive integer or zero. Put 


(ql) =f@)e 
Equation (63) now becomes 


d2 ! 12 1 12 9 1 
f_,44 gr eee. oy (ee ler a ee, 
dq dq! a? a4 


ay q df  2n 
or 2 = 
dq? adq | a f=0 


tErwin Schrédinger (1926) ,Quantisierung als Eigenwertproblem“, Annelen der Physik und 
Chemie 384(6), pp. 489-527. doi:10.1002 /andp.19263840602 


Gibbs ensemble 


phase space 


132 VI. THE EQUATIONS OF MOTION 


The solution of this equation, with n any non-negative integer, is a finite power 
series in gq. For 


n=0, 1, 2, 3, 


the solutions are easily verified to be 


The successive eigenfunctions are thus 


CA ae ee q‘W=der 
—q'2 /2q? _ 2 a2 (64) 
(q'|2) = (¢? — 40?)e? ? (q'|3) = (q? — 8q'a%)e 7", 


37. The Gibbs Ensemble 


In our work up to the present we have been assuming all along that 
our dynamical system at each instant of time is in a definite state, that is 
to say, its motion is specified as completely and accurately as is possible 
without conflicting with the general principles of the theory. In the classical 
theory this would mean, of course, that all the coordinates and momenta have 
specified values. Now we may be interested in a motion which is specified to 
a lesser extent than this maximum possible. The present section will be devoted 
to the methods to be used in such a case. 

The procedure in classical mechanics is to introduce what is called 
a Gibbs ensemble, the idea of which is as follows. We consider all the dynamical 
coordinates and momenta as Cartesian coordinates in a certain space, 
the phase space, whose number of dimensions is twice the number of degrees 
of freedom of the system. Any state of the system can then be represented by 
a point in this space. This point will move according to the classical equations of 
motion (11). Suppose, now, that we are not given that the system is in a definite 
state at any time, but only that it is in one or other of a number of possible states 
according to a definite probability law. We should then be able to represent it by 
a fluid in the phase space, the mass of fluid in any volume of the phase space 
being the total probability of the system being in any state whose representative 
point lies in that volume. Each particle of the fluid will be moving according 
to the equations of motion (11). If we introduce the density p of the fluid at 
any point, equal to the probability per unit volume of phase space of the system 
being in the neighbourhood of the corresponding state, we shall have the equation 
of conservation 


37. The Gibbs Ensemble 133 


Op _ Oo ( dy\ _ 9 { dp, 
i eaG ae (te) 
EAE (2-202) 

— | Oar \' Opry One 00 


= —|p, H]. 65 
This may be considered as ie ite of motion for the fluid, since it ae 
the density p for all time if p is given initially as a function of the q’s and p’s. 
It is, apart from the minus sign, of the same form as the ordinary equation of 
motion (12) for a dynamical variable. 
The requirement that the total probability of the system being in any state 
shall be unity gives us a normalizing condition for p 


[oad dp =1, (66) 


the integration being over the whole of phase space and the single differential dq 
or dp being written to denote the product of all the dq’s or dp’s. If 6 denotes any 
function of the dynamical variables, the average value of 3 will be 


ii Bp dq dp. (67) 
It makes only a trivial alteration in the theory, but often facilitates discussion, 


if we work with a density p differing from the above one by a positive constant 
factor, k say, so that we have instead of (66) 


[oda dp =k (68) 


With this density we can picture the fluid as representing a number k of similar 
dynamical systems, all following through their motions independently in the same 
place, without any mutual disturbance or interaction. The density at any point 
would then be the probable or average number of systems in the neighbourhood 
of any state per unit volume of phase space, and expression (67) would give 
the average total value of 6 for all the systems. Such a set of dynamical systems, 
which is the ensemble introduced by Josiah Willard Gibbs, is usually not realizable 
in practice, except as a rough approximation, but it forms all the same a useful 
theoretical abstraction. 

We shall now see that there exists a corresponding density p in quantum 
mechanics, having properties analogous to the above. It was first introduced by 
John von Neumann. Its existence is rather surprising in view of the fact that 
phase space has no meaning in quantum mechanics, there being no possibility of 
assigning numerical values simultaneously to the q’s and p’s. 

We consider a dynamical system which is at a certain time in one or other of 
a number of possible states according to some given probability law. These states 
may be either a discrete set or a continuous range, or both together. We shall 
here take for definiteness the case of a discrete set and suppose them labelled 


134 VI. THE EQUATIONS OF MOTION 


by a parameter m. Let their normalized representatives in some representation 
be (€'|m) and let the probability of the system being in the m-th state be Py. 
We then define the quantum density p through its representative: 


(E'p|é") = S0(E lm) Pm”). (69) 


Let p’ be any eigenvalue of p and (£'|) an eigen-w belonging to this eigenvalue, 
so that 


ii SE" ln) Pm (me) de" (€"|) = pC), 


if we assume the €’’s to take on continuous ranges of values, for definiteness. 
Multiplying this equation by (|€’), the conjugate complex of (€’|), and integrating 
over all €’, we get 


/ fi ST (le?) dé! (E'lrn) Pram”) dé" (€"t[) = / ({e") de (€')). 


which may be written 


2 


m 


2 
Pu =o f (EDI ae’, 


i: ({e’) dé’ (e/|m) 


Now Pn, being a probability, can never be negative. It follows that p’ cannot 
be negative. Thus p has no negative eigenvalues, in analogy with the fact that 
the classical density p is never negative. 

Let us now obtain the equation of motion for our quantum p. The (&’|m)’s 
and (m|€”)’s in (69) will vary with the time in accordance with Schrédinger’s 
wave equation (3) and its conjugate complex (4), while the P,,’s will remain 
constant, since the system, so long as it is left undisturbed, cannot change over 
from being represented by one wave function to being represented by another, 
so that the probability of its being represented by any particular wave function 
must remain constant. We thus have from (69)! 


= Sf Le") Po Come") — |e) Pools") ("LTE") 6 


= /enene"iae) = (ST ele) (LTE) } ag", 


t[The variable of integration, €’”, is written outside the integrands instead of Dirac’s ‘clever’ 
repeat inside the integrands’ terms.| 


37. The Gibbs Ensemble 135 


by using (69) again. This result may be written symbolically 


ihp = Hp — pH (70) 
or p= —|p, A] 


and is thus the proper quantum analogue of the classical equation of motion (65). 
Our quantum p, like the classical one, is determined for all time if it is 
given initially. 

From the assumption of 812, the average value of any observable 3 when 
the system is in the state represented by (£'|m) is 


i | (onlé’) dé! (€|B|e"”) dé" (€"|m). 


Hence if the system is distributed over the various states represented by the (£’|m)’s 
according to the probability law P,, the average value of 3 will be 


3 Jf crue ag tale") ag" (erm) Pa = ff (eiale") a6" ("lol ag 
= [ce \Bole) ae’ = f (e'|p se") ae’. (71) 


This is the analogue of the expression (67) of the classical theory. Whereas in 
the classical theory we have to multiply 6 by p and take the integral of the product 
over all phase space, in the quantum theory we have to multiply 6 by p and 
‘integrate along the diagonal’ in the representative of the product. We have further, 
using the condition that the (&’|m)’s are normalized, 


Joke de =X felmpPalrne de =S9Pw= 1, (2) 


since the total probability of the system being in any state is unity. This is 
the analogue of equation (66). One more result, which follows directly from 
expression (35) of Chapter IV for interpreting representatives of states, is that 
the probability of the €’s having values in the neighbourhood of €’ per unit range 
of the €’’s is 


d. (lm)? Pn = (E'll€’)- (73) 


This gives a physical meaning to the integrand on the left-hand side of (72). 

As in the classical theory, we may take a density equal to k times the above p 
and consider it as representing a Gibbs ensemble of & similar dynamical systems, 
between which there is no mutual disturbance or interaction. We shall then have k 


136 VI. THE EQUATIONS OF MOTION 


on the right-hand side of (72), and (71) will give the total average ( for all 
the members of the ensemble, while (73) will give the total probability of a member 
of the ensemble having values for its €’s in the neighbourhood of €', per unit range 
of the €’’s. 

An important application of the foregoing theory is that it enables one to get 
a clearer understanding of the significance of the normalization of a w labelled by 
parameters that take on continuous ranges of values, as defined by equation (23) 
of Chapter IV. Let us take a system with n degrees of freedom describable in 
terms of canonical coordinates and momenta and suppose that it is in one or 
other of the simultaneous eigenstates of all the momenta, the probability of its 
being in an eigenstate belonging to eigenvalues for the p’s between p’ and p’ + dp’ 
being P,,,dp’. Then in a representation in which the q’s are diagonal, the density p 
will be represented by 


(q'lol¢") = / (4'lp")Py ap! (v'\q"). (74) 


The (q'|p’)’s here are the q-representatives of the eigenstates of the momenta and 
are given by equation (43) of Chapter V. Thus 


(q'|plq") = Fe gaa dp! eat Pt tanh )/h 
and (d|eld") =h™ / Py dp’. (75) 


These (q'|p’)’s are normalized in accordance with the rule for ~’s labelled by 
parameters that take on continuous ranges of values, i.e. 


[ow dq’ (q'|p") = 5(p' — p"), 
and not to make the corresponding w’s of length unity, which would require 
[ow dq (q'|p') = 1. 


In consequence our p does not satisfy equation (72). In fact (75) shows at once that 
J (d\eld’) dq’ is infinite, since the integrand is independent of q'. Thus p should be 
considered as representing an ensemble of an infinite number of systems. The total 
probability of a member of the ensemble having its q’s in the neighbourhood of q’ 
per unit range of the q’’s is given by (75), where it is expressed as an integral 
over the momentum variables. The integrand here, namely h~"P,,, may thus 
be interpreted, in a naive way, as the probable number of systems per unit of 
phase space. 


37. The Gibbs Ensemble 137 


We can, however, get a different interpretation for P,, by going back to 
equation (74) and replacing in it the continuous ranges of values for the p’’s 
by discrete sets of points lying very close to one another, in accordance with 
the method explained at the end of §24. Equation (74) goes over into 


(dIpla") = Sp’) oPa(v'la")n, 


p! 


by the same arguments as led to (43) of §24. Here P, appears as the probable 
number of systems in one of the eigenstates of momentum, these eigenstates 
now forming a discrete set. Comparing these two meanings for P,, we see that 
they will agree if we put a volume h” of phase space equal to a discrete state. 
Thus the normalization rule for the case of continuous parameters is equivalent to 
counting a volume h” of phase space as having the same weight as a discrete state. 


angular 
momentum 


VIL. MOTION IN A CENTRAL 
FIELD OF FORCE 


38. Introduction of the Angular Momentum 


AN atom consists of a massive positively charged nucleus together with a number 
of electrons moving round, under the influence of the attractive force of the nucleus 
and their own mutual repulsions. An exact treatment of this dynamical system 
would be a very difficult mathematical problem. One can, however, gain some 
insight into the main features of the system by making the rough approximation of 
regarding each electron as moving independently in a certain central field of force, 
namely that of the nucleus, assumed fixed, together with some kind of average of 
the forces due to the other electrons. Thus our present problem of the motion of 
a particle in a central field of force forms a corner-stone in the theory of the atom. 

Let the Cartesian coordinates of the particle, referred to a system of axes with 
the centre of force as origin, be x, y, z and the corresponding components of 
momentum pz, Py, pz. They satisfy the quantum conditions 


[x,y] = 0, [z, px] = i aon = 0, 
etc. The Hamiltonian, with neglect of relativistic mechanics, will be of the form 
H = (1/2m)(p2 + p, + 2) + V, (1) 


where V, the potential energy, is a function only of (x? + y? + 2”). 
We now introduce the components of angular momentum defined, as in 
the classical theory, by 
Mz = YPz — ZPyy My = 2Py —XLPz, Mz = LPy — YPz, (2) 
or by the vector equation m=xxXp. 
From these equations we obtain at once the identity 


Mt +myy +m,z = 0. (3) 


We must evaluate the P.B.s of the angular momentum components with 
the dynamical variables x, p,, etc., and with each other. This we can do most 
conveniently with the help of the laws (4) and (5) of §25, thus 


138 


38. Introduction of the Angular Momentum 139 


[mz, ¢] = [py — ype, t] = —y[pe, t] = y, (4) 
[mz,y] = [zpy — ype, y| = x[py, y] = —2, 
[mz; z] = [zpy — YPz, Z| = 0, (5) 
and similarly Pepe Sle: eel = Hes (6) 
[mz, De] = 0, (7) 
with corresponding relations for mz; and my. Again 
[my, mz] = [2px — tDpz, Mz] = 2Z[P2, Mz] = [a mz|p- 
= —Z2py + YPz = Mz, 
(8) 
[Migs tts || Ss [Maz, My] = Mz. 


These results are all the same as in the classical theory. The sign in the results (4), 
(6) and (8) may easily be remembered from the rule that the + sign occurs when 
the three dynamical variables, consisting of the two in the P.B. on the left-hand 
side and the one forming the result on the right, are in the cyclic order (xyz) and 
the — sign occurs otherwise. 

From (4) and (5) we obtain 


[mz, 27 + y? + 27] = 2[m,,2] + (mz, 2] + y[mz,y] + (mz, yly 
=ryt+yr—yx—xry =0. (9) 
Similarly from (6) and (7) we find 
[m., 2 +p, + pz] = 0. (10) 


Thus m, commutes with (a?+y?+27) and with (p?+p;+p2). It therefore commutes 
with the Hamiltonian H which, according to (1), is a function of these two 
dynamical variables only. Similarly m,; and m, commute with H. Thus the angular 
momentum is a constant of the motion, as in the classical theory. 

Equations (8) may be put in the vector form 


m xX m=ithm. (11) 


If we have several particles with angular momenta mj , my,..., each of them will 
satisfy (11), thus 
m, X m, = 7hm,. 


Further, any one of these angular momenta will commute with any other, so that 


m,xm,+m,xm,=0 (rs). 


140 VII. MOTION IN A CENTRAL FIELD OF FORCE 


Hence if M = 5°. m, is the total angular momentum, 


MxM=) > m, xm, = 5m, x m,+ 5 (m, x m, +m, x m,) 


r,s r<s 


= ih) > m, = ihM. 


This result is of the same form as (11), so that the components of the total 
angular momentum M of any number of particles satisfy the same commutability 
relations as those of the angular momentum of a single particle. Thus (11) or (8) 
may be regarded as the general commutability relations satisfied by any angular 
momentum. They certainly hold when the angular momentum is that of a number 
of particles, and may be assumed to hold also for the angular momentum of 
a spinning body. 


39. Properties of Angular Momentum 
We shall here consider some general properties of any angular momentum m. 
We introduce a dynamical variable 0 defined by 
d6=m2+ m, +m? 
having the meaning of the square of the magnitude of the vector m. With the help 
of (8) we obtain 
igo] = [mz,m* + m?] 

= [mg,mylmy + mylme, My] + [me melm. + mz[me, mz] 

=M,zMy + MyM, — MyM, — MzMy 

=, 
Thus m, commutes with 6. Similarly m, and m, commute with 6. We shall 


now assume @ is an observable and introduce a representation in which # and m, 
are diagonal. Since any function f of the m’s commutes with 0, i.e. 


Of — fo =0, 
its representative will satisfy 
6"(Btm,|f|0"m") — (Brn. f|0"m) 6" = 0, 
or {6' — 6"}(6'm!,|f\@"m") = 0, 


so that all its matrix elements (6’m/|f|0’m) will vanish except those for 


which 6’ = @”, or those which are diagonal in 0, as we may say. Thus if we express 


39. Properties of Angular Momentum 141 


any equation between functions of the m’s in terms of representatives, the surviving 
matrix elements will all be diagonal in @ and will refer to the same eigenvalue 6’ 
all through the equation. Hence in such equations we may count the dynamical 
variable 9 simply as the number 6’. 

Taking now the equation 


(mz + imy)m, —m,(Mz + im,) = —ihm, — im, 
= =f ing +im,) 
or Rg Uy) ize = riz = B) (ing + ty); 


and expressing it in terms of representatives, we get 
(0’m!,|mz + im,|0'm!2) my = {mi — h} (0m! |m, + im, |O'm?). 


Thus all the matrix elements (6’m/,|m,+im,|6’m) of the representative of m,+im, 
vanish except those for which m! = m/, — h. If we now express the equation 


(mz + imy)(m, —imy) = m2 + m, —i(MgMmy — MyMz) 
=m? + ms +hm, 
=@d- m +hm, 


in terms of representatives and equate the diagonal elements on each side referring 
to the eigenvalues 6’, m/,, so as to get 


S(8'm!|mz + im, |6'm4) (Om! my — imy|6'm,) = 6 — ml? + hl, 


uM 
my, 


we shall have all the terms in the sum on the left-hand side vanishing except 
(at most) the one for which m! = m/,—h. If m, — fh is not an eigenvalue 
of m,, then all the terms in the sum will vanish without exception. In any case 
6’ — ml’ + hm’, is positive or zero and, if m/,—h is not an eigenvalue of mz, it must 
be zero. Thus, considering 6! — m/,? + hm’, as 6’ + 4h? — (m!, — 4h)? we can draw 
the conclusions that 


(i) 6’ + fh? is positive or zero, 
(ii) for any 0’ there is a minimum m1, satisfying 
(m, — 3h)” = 6' + ah’ 
(iii) any other m/, for this 6’ is greater than the minimum one by an integral 
multiple of h. 


142 VII. MOTION IN A CENTRAL FIELD OF FORCE 


The above conclusions provide us with an example of a mathematical 
phenomenon which we have not met with previously, namely, that with two 
commuting observables, the permissible eigenvalues of one depend on what 
eigenvalue we assign to the other. This phenomenon may be understood as 
the two observables being not altogether independent, but partially functions of 
one another. 

By a similar piece of work to the above, based on the equation 


(mz — imy)(Mz + imy) = 8 — m —hm.,, 


we can draw two further conclusions, namely, 

(ii)’ for any 6’ there is a maximum m., satisfying 

(mi, — gh)’ = 0 + 4h’, 

(iii)’ any other m‘, for this 6’ is less than the maximum one by an integral 
multiple of h. 
From (ii) and (ii) it follows that the minimum value of m/, is —(6’ + 4h?)?+4h and 
the maximum value is (6’ + 442)? — 4h. The maximum value minus the minimum 
value, ie. 2(6’ + 4h?)2 — h, must be an integral multiple of A not less than zero. 
Let us introduce the new dynamical variable k, defined by 


k +h = (m2 +m? +m? +4h)2 (12) 


the positive square root being taken on the right-hand side, in accordance with 
the definition at the end of §11. This equation, it may be noted, gives 


k(k +h) = m+ m2 + mi. 
We now have k = (6 + 4h?)? — 4h, 


so that the eigenvalues of k are integral or half odd integral multiples of h not less 
than zero. For each eigenvalue k’ of k the eigenvalues of mz are 


ki kif, k'—2h, ..., —k +h, —K 


From symmetry, mz and m, have the same eigenvalues as m,. 

The dynamical variable k defined by (12) is the convenient one to use for 
describing the magnitude of an angular momentum vector m. It is preferable to 
the square root of #6 on account of its simpler eigenvalues. The eigenvalues of & and 
of the components m,, m,, m, are always integral or half odd integral numbers 
of quanta h. If, however, m is the angular momentum of a particle moving in 
an orbit, then the eigenvalues of mz, My, mz and k must all be integral numbers 
of quanta. To verify this we take m,, which is now of the form given by the third 


39. Properties of Angular Momentum 143 


of equations (2), and put it in a representation in which the coordinates x, y, z 
are diagonal. According to §27 its representative will be the differential operator 


when operating to the right. The easiest way of obtaining the eigenvalues of 
this operator is to transform it to the cylindrical coordinates z, p, y in which p 
and » are defined by x = pcosy, y= psiny. It then becomes simply —ihd/Ovy. 
The eigenfunctions of this operator are obviously of the form ae’"’, where a is 
a function of z and p only and n is an integer! The corresponding eigenvalue 
for m, is nh, and is thus an integral multiple of h. The eigenvalues of k must then 
also be integral multiples of h. 

Although the angular momentum of orbital motion of a particle must have 
integral eigenvalues, there is no reason why the spin angular momentum of 
a particle should not have half odd integral eigenvalues, since a spin angular 
momentum is not expressible in terms of coordinates and momenta in the form (2). 
In fact experimental evidence shows that electrons and many kinds of atomic nuclei 
have spin angular momenta with half odd integral eigenvalues. A further remark 
about spin is that a spin angular momentum may have a magnitude k with only 
one eigenvalue. This is possible since k commutes with the three components of 
the angular momentum, so that we do not get any inconsistency by putting k 
equal to a number. (This would not be possible for the & of an orbital angular 
momentum, since for such a k there would exist the variables x, p,, etc., with which 
it does not commute.) It is found that all the more elementary particles of 
atomic physics, such as electrons and atomic nuclei, do have spin angular momenta 
with magnitudes & with only one eigenvalue. For the spin of electrons, k has 
the one eigenvalue 3h. This results in the components m;, m,, mz of the spin 
angular momentum of an electron each having the eigenvalues 4h & —4h. These 
components are thus of the form of 4h times the a’s of §19, since these o’s each have 
the eigenvalues 1 & —1 and their commutability relations, namely equations (53) 
of §19, are the same as (8), apart from the factors 4h. Theoretical reasons 
for this particular spin angular momentum for an electron will be obtained in 
Chapter XII. For the spins of the other elementary particles (except photons) 
there is at present no theoretical information, and one has to depend entirely on 
experimental evidence. 

The components mz, my, mz of an angular momentum in different directions do 
not commute with each other, so that one cannot in general assign numerical values 
to them simultaneously. One can at most give a numerical value to the component 


'Tt is a general requirement of our theory of representations that representatives of ~’s and ¢@’s 
shall always be single-valued. Thus our eigenfunctions must be single-valued. 


spatial quatization 


144 VII. MOTION IN A CENTRAL FIELD OF FORCE 


in one particular direction. The state of the system is then said to be spatially 
quantized in that direction. There is, however, one special case in which one can 
assign numerical values to all the components simultaneously, namely, one can 
give them all the value zero, since this will not contradict the commutability 
relations (8). The resulting state of zero angular momentum, with k = 0, is one 
that is spatially quantized simultaneously in all directions and is, according to 
the work at the end of §29, spherically symmetrical. 


AO. Transition to Polar Coordinates 


For further discussion of the problem of motion in a central field of force it is 
convenient to introduce polar dynamical variables. We introduce first the radius r, 
defined as the positive square root 


r= (+a + yh 
If we evaluate its P.B.s with p,, py and p., we obtain, with the help of formula (38) 
of Chapter V, a 
i pel== ==, np l=2, inpd== 
22 Ox 7 7hY me > PZ we 
the same as in the classical theory. We introduce also the dynamical variable p,. 
defined by 
Dr = 1 "(apy + Ypy + zpz — th). (13) 
Its P.B. with r is given by r[r, p,] = [r, rp,] = [7 tp2 + yy + 202] 
= xlr, Pr| RG ylr, Py! + 2(r, pz] 


x y z 
=2@-+y-4+2-=F7r. 
ig r r 
Hence 5 Pel = 1 
or TD = Del SAN. 


Wel can now see that p, is real, since its conjugate complex p, is given by 


P, = (Pet + pyy + p.z + ih)r! 
= (£Pz + YPy + 2ZPz — 2ih)r—* 
= (rp, — ih)r—! pe 


[The 4th edition has ‘The commutation relation between r and p, is just the one for a 
canonical coordinate and momentum, namely equation (10) of §22 [of the 4th edition]. This 
makes p, like the momentum conjugate to the r coordinate, but it is not exactly equal to 
this momentum because it is not real, its conjugate complex being 

Py = (Pot + pyy + pzz)r—* = (apa + YPy + zpz — 3ih)r 
= (rp, — 3if)r—' = p, — 2ihr—*. 
Thus p, — ihr~+ is real and is the true momentum conjugate, the radial momentum,to r.’| 


40. Transition to Polar Coordinates 145 


The commutability relation between r and p, is just the one for a pair 
of canonically conjugate variables, namely equation (12) of Chapter V. 
Now the eigenvalues of r, from its definition as a positive square root, must be all 
positive or zero, so that we have obtained a contradiction to the result, proved at 
the end of §26, that a dynamical variable can have a canonical conjugate only if 
its eigenvalues include all numbers from —oco to co. This inconsistency arises from 
the fact that the dynamical variable p, defined by (13) does not strictly exist, 
since r has the eigenvalue zero so that r~! does not strictly exist. In spite of this 
defect the dynamical variable p, is a useful one for the study of motion in a central 
field of force. Our equations, which will often involve p, and will sometimes 
involve r~! in other ways than through p,, will be liable to be inaccurate, but only 
in so far as they apply to the one point r = 0. It will be necessary to make 
a special investigation of solutions of the wave equation obtained with the help of 
polar variables to see whether they are satisfactory at the point r = 0. We shall do 
this later in this section. It was mentioned at the end of §26 that e°? is inadmissible 
as an operator in quantum mechanics if c is real and the eigenvalues of q extend 
from —oo to oo, but since the eigenvalues of r extend only from 0 to oo, e™ is 
admissible if c is negative. 

We can easily verify that our two new dynamical variables r and p, commute 
with the angular momentum. Equation (9) shows us that m, commutes with r? 
It must therefore commute also with r, since r is defined as a square-root function 
so that, from a theorem on page 52, everything that commutes with r? commutes 
also with r. Again, for p, we have 


aloe Mz] = rp, mz, = eps a YPy, Mz] 
= —YPx — Lpy + Lpy + ys = 0. 
Thus r and p, commute with m,, and hence also with m, and m, and with k. 
We can now express the Hamiltonian in terms of our radial dynamical variables 


r and p, and also k. We have, if ys denotes a sum over cyclic permutations of 
the suffixes x, y, 2, ayz 


k(k+h) = Som? => (apy — ype)? 


xyz ryz 


= So (apyxpy + YPcYPx — LPyYPx — YPxLPy) 


ryz 


= $0 (2°) + yp; — 2PePyy — YPyPot + 2"p; — TPepPet — 2ihap,) 


ryz 
= (a? + y? + 2°)(p2 + py + v2) — (ape + ypy + 2pz) (Det + pyy + pz + 2h) 
= 1? (p2 + py + 2) — (rp, + if) 

=1?(p. +p. + pz) — r?pe. 


146 VII. MOTION IN A CENTRAL FIELD OF FORCE 


k(k 
Hence A= p. 4 (5a) 
2m r2 


This form for H is such that k commutes not only with H, as is necessary since 
k is a constant of the motion, but also with every dynamical variable occurring 
in H, namely both r and p,. Thus in dealing with the Hamiltonian in this form 
we can treat k as a number. The permissible numbers we can take for k are its 
eigenvalues and are thus positive integral multiples of h or zero. The equation for 
the representatives of the stationary states will now read? 


{i (-" 5 - =) vi (r|) = H’(r}), (15) 


the single variable r in the wave function (r|) being sufficient when k is 
counted as a number. Any value of the parameter H’ for which this equation, 
with a permissible value for k, has a solution (satisfying the boundary conditions 
to be discussed later) is a possible energy-level of the system. The energy-levels 
(except those for which k = 0) each belong to several independent stationary states, 
corresponding to the various possible eigenvalues of a Cartesian component of 
the angular momentum. The number of these states, for any value of k, is the odd 
number (2k/h + 1). 

If we write down the equation for the representatives of the stationary states 
in the original Cartesian coordinates x, y, z, we shall have 


1 


LV. (14) 


ian” + vi (xyz|) = H’(xyz\), (16) 


where V? is the Laplacian operator 0?/0x? + 0?/dy? + 07/0z?. This becomes, 
on transforming to polar coordinates r, 6, ¢, 


h (8 2€@ 1 0. .¢@4 1 a2 
ae (Fe Or pane OF One ari) | v} (r06|) = H'(r0¢)). 


The solutions of this equation are of the form 


(rO9|) = x(r)Sn(9, 9), 
where S,, is a spherical harmonic of order n, satisfying 


Th 10 be 2. i 0? 
( fala) sin? =p sin? Q oS) Sn(9, ¢) a —n(n + 1)S,(8, ¢), 


tWe are here omitting the primes from the variables in the wave functions. This is often 
convenient when one can do so without confusion, it being understood that the variables 
in the wave function, or in any representative, denote eigenvalues of observables and not 
the observables themselves. 


40. Transition to Polar Coordinates 147 


n being an integer, and x(r) is a function of r only, satisfying 


{- e (= ae - Hest) | vba = H'y(r). (17) 


2n \ Or? rOr r 


This equation, like (15), is such that the values of H’ for which it has a solution 
are the energy-levels of the system. 

The equivalence of equations (15) and (17) may be seen from the fact that if 
in (15) we put (r|) = ry(r) we obtain just equation (17) with n = k/h. The fact 
that the two eigenfunctions (r|) and x(r) are not identical but differ by this factor r 
is due to their different physical interpretations. A solution (r|) of (15) represents 
a state for which the probability of the particle lying in the spherical shell between r 
and r+dr is proportional to |(r|)|’ dr. On the other hand, a solution (xyz) of (16) 
represents a state for which the probability of the particle lying in a small volume 
dadydz is |(xyz|)|" dadydz or |x(r)S,(0, ¢)|? dxdydz, so that the probability of 
its lying in the spherical shell between r and r + dr is proportional to |y(r)|’ dr. 
Thus the physical interpretations require (r|) to be proportional to ry(r). 

It should be noticed that not every solution of (17), when multiplied by 
the appropriate spherical harmonic, will give a solution of (16), as it may fail 
to satisfy (16) at the origin. One can see most clearly how this comes about 
by considering the special case for which the potential V vanishes, giving us 
the problem of the free particle. If we further take H’ = 0, equation (16) reduces to 


V?(ryz|) =0 (18) 


and equation (17) to 


(on : oT 7 a : ee os 


Now a solution of (19) for n = 0 is y(r) = 1/r, but this solution multiplied 
by the appropriate spherical harmonic Sg = 1 does not satisfy (18), since, 
although V?(1/r) vanishes for any finite value of r, its integral through any volume 
about the origin is 47, and hence 


V?(1/r) = 416(x)d(y)d(z) 


Thus the solution y(r) = 1/r of (19) does not represent a stationary state of 
the system. Again the solution y(r) = 1/r? of (19) for n = 1, when multiplied 
by the spherical harmonic S; = cos 6, gives a wave function (xyz|), the integral 
of the square of whose modulus over any volume, however small, that contains 
the origin is infinite. This wave function must represent a state for which 
the particle is certainly at the origin and this cannot be a stationary state of 


boundary 
condition 


closed state 


148 VII. MOTION IN A CENTRAL FIELD OF FORCE 


zero energy for the problem of the free particle. Similarly for arbitrary n in 
equation (19), of the two solutions y(r) = r" and y(r) = r~"~|, the second will 
not give the representative of a stationary state of the system. 

It thus appears that equation (17) is not adequate to replace equation (16) 
as the necessary and sufficient condition for the representative of a stationary 
state. Equation (17) must be supplemented by a suitable boundary condition at 
the point r = 0. Any solution y(r) of (17) for which the integral [,r?|x(r)|° dr 
is not convergent must certainly be rejected, and also some for which this integral 
is convergent, namely those which, when operated on by V2, give an infinite result 
involving the 6 function at the origin. These conditions show that only those 
solutions are to be allowed which, if they tend to infinity as r — 0, do so more 
slowly than 1/r. The corresponding boundary condition for the function (r|) of 
equation (15) is that it shall tend to zero as r + 0. 

There are also boundary conditions for the eigenfunction ast r —> oo. If we are 
interested only in ‘closed’ states, i.e. states for which the particle does not go off 
to infinity, we must restrict the integral [™ |(r|)|? dr or [*r?|x(r)|? dr to be 
convergent. These closed states, however, are not the only ones that are physically 
permissible, as we can also have states in which the particle arrives from infinity, 
is scattered by the central field of force, and goes off to infinity again. For these 
states the wave function (xyz|) may remain finite as r > oo. Such states will be 
dealt with in Chapter IX under the heading of collision problems. In any case 
the eigenfunction (xyz|) must not tend to infinity as r — oo, or it will represent 
a state that has no physical meaning. 


41. Energy-levels of the Hydrogen Atom 


The above analysis may be applied to the problem of the hydrogen atom 
with neglect of the relativistic variation of mass with velocity and the spin of 
the electron. The potential energy V is now? —e?/r, so that equation (15) becomes 


{ d= k(k+1)  2me? =} | 2mH' 
| rlj)=— 


dr2 — 72 ' he + h2 (rl), (20) 
when written in terms of a new dynamical variable k, equal to h~! times 
the previous k. A thorough investigation of this equation has been given by 
Erwin Schrédinger.® We shall here obtain its eigenvalues H’ from a consideration 
of its eigenfunctions expressed in the form of power series. 


Original has ‘at. r = 00.’ 


tThe e here, denoting minus the charge on an electron, is, of course, to be distinguished from 
the e denoting the base of exponentials. 

’Schrédinger, E. (1926). ,Quantisierung als Eigenwertproblem“Annalen Der Physik, 384(4), 
pp. 361-376. [doi: 10.1002/andp.19263840404 | 


41. Energy-levels of the Hydrogen Atom 149 


It is convenient to put 


(rl) = fre" (21) 
introducing the new function f(r), where a is one or other of the square roots 
a= +,/(-h?/2mH"). (22) 
Equation (20) now becomes 
d= 2d k(k+1)  2me? 1 
= ee = 0. 23 
{ are “nde i h? =} Ir) co) 


We look for a solution of this equation in the form of a power series 
f(r) = S- CsI, (24) 


in which consecutive values for s differ by unity although these values themselves 
need not be integers. On substituting (24) in (23) we obtain 


S > ce{s(s — 1)r?? — (28/a)r*4 — k(k + 1)r?-? + (2me?/h?)r?1} = 0, 


which gives, on equating to zero the coefficient of r°~?, the following relation 
between successive coefficients c., 


c.[s(s — 1) — k(k +1)] = c,_1[2(s — 1)/a — 2me?/h?]. (25) 


We saw in the preceding section that only those eigenfunctions (r|) are allowed 
that tend to zero with r and hence, from (21), f(r) must tend to zero with r. 
The series (24) must therefore terminate on the side of small s and the minimum 
value of s must be greater than zero. Now the only possible minimum values of s are 
those that make the coefficient of c, in (25) vanish, i.e. k+1 and —k, and the second 
of these is negative or zero. Thus the minimum value of s must be k +1. Since k 
is always an integer, the values of s will all be integers. The series (24) will in 
general extend to infinity on the side of large s. For large values of s the ratio of 


successive terms 1S Cy or 


Cpa sa 
according to (25). Thus the series (24) will always converge, as the ratios of 
the higher terms to one another are the same as for the series 

Le fory 

ad aoe 26 
d s} ( a ) (28) 


which converges to €?"/*. 


150 VII. MOTION IN A CENTRAL FIELD OF FORCE 


We must now examine how our solution (r|) behaves for large values of r. 
We must distinguish between the two cases of H’ positive and H' negative. For H’' 
negative, a given by (22) will be real. Suppose we take the positive value for a. 
Then as r — co the sum of the series (24) will tend to infinity according to the same 
law as the sum of the series (26), i.e. the law e?"/% Thus, from (21), (r|) will tend 
to infinity according to the law e’/* and will not represent a physically possible 
state. There is therefore in general no permissible solution of (20) for negative 
values of H’. An exception arises, however, whenever the series (24) terminates 
on the side of large s, in which case the boundary conditions are all satisfied. 
The condition for this termination of the series is that the coefficient of c,_; in (25) 
shall vanish for some value of the suffix s—1 not less than its minimum value k+1, 
which is the same as the condition that 


for some integer s not less than k+1. With the help of (22) this condition becomes 


‘ met 


Seer! (27) 
and is thus a condition for the energy-level H’. Since s may be any positive 
integer, the formula (27) gives a discrete set of negative energy-levels for 
the hydrogen atom. These are in agreement with experiment. Each of them 
(except the lowest one s = 1) may occur with various possible values for k, namely, 
any positive or zero integer less than s. This multiplicity is in addition to that 
mentioned in the preceding section arising from the various possible values for 
a component of angular momentum, which multiplicity occurs with any central 
field of force. The k multiplicity occurs only with an inverse square law of force 
and even then is removed when one takes relativistic mechanics into account, 
as will be found in Chapter XII. The solution of (20) when H’ satisfies (27) tends 
to zero exponentially as r — oo and thus represents a closed state (corresponding 
to an elliptic orbit in Bohr’s theory). 

For any positive values of H’, a given by (22) will be! imaginary. The series (24), 
which is roughly the same as the series (26), will now have a sum that remains 
finite as r — oo. Thus (r]) given by (21) will now remain finite as r — oo 
and will therefore be a permissible solution of (20), since it will correspond to 
an eigenfunction (xyz|) that tends to zero according to the law 1/r as r > oo. 
Hence in addition to the discrete set of negative energy-levels (27), all positive 
energy-levels are allowed. The states of positive energy are not closed, since their 
representatives (r|) do not make the integral [* |(r|)|? dr converge. (These states 
correspond to the hyperbolic orbits of Bohr’s theory.) 


t[pure’ omitted] 


151 


42. Selection Rules 


If a dynamical system is set up in a certain stationary state, it will remain 
in that stationary state so long as it is not acted upon by outside forces. 
Any atomic system in practice, however, frequently gets acted upon by external 
electromagnetic fields, under whose influence it is liable to cease to be in one 
stationary state and to make a transition to another. The theory of such transitions 
will be developed in §§47 and 48. A result of this theory is that, to a high 
degree of accuracy, transitions between two states cannot occur under the influence 
of electromagnetic radiation if, in a Heisenberg representation with these two 
stationary states as two of the basic states, the matrix element, referring to these 
two states, of the representative of the total electric displacement D of the system 
vanishes. Now it happens for many atomic systems that the great majority of 
the matrix elements of D in a Heisenberg representation do vanish, and hence 
there are severe limitations on the possibilities for transitions. The rules that 
express these limitations are called selection rules. 

The idea of selection rules can be refined by a more detailed application 
of the results of the theory of 848, according to which the matrix elements 
of the different Cartesian components of the vector D are associated with 
different states of polarization of the electromagnetic radiation. The nature of 
this association is just what one would get if one considered the matrix elements, 
or rather their real parts, as the amplitudes of harmonic oscillators which interact 
with the field of radiation according to classical electrodynamics. We shall consider 
some examples to illustrate this. 

There is a general method for obtaining all selection rules, as follows. Let us call 
the constants of the motion which are diagonal in our Heisenberg representation a’s 
and let D be one of the Cartesian components of D. We must obtain an algebraic 
equation connecting D and the a’s which does not involve any dynamical variables 
other than D and the a’s and which is linear in D. Such an equation will be of 


the form 
Ss" Sr DG, = 0, (28) 


where the f,’s and g,’s are functions of the a’s only. If this equation is expressed 
in terms of representatives, it gives us 


sna (a’)(a'|D\a")9,-(a") = 0, 
or a’ | D\a") ) dbl!) op(a”") = 0, 
which shows that (a’|D|a”) = 0 ses 


dbo! gr(a") = 0. (29) 


selection rule 


152 VII. MOTION IN A CENTRAL FIELD OF FORCE 


This last equation, giving the connexion which must exist between a’ and a” 
in order that (a’|D\a”) may not vanish, constitutes the selection rule, so far as 
the component D of D is concerned. 

Our work on the harmonic oscillator in §36, in connexion with equations (52) 
and (53) there, provides an example of a selection rule. If the harmonic oscillator 
carries an electric charge, its electric displacement D will be proportional to q. 
The selection rule is then given by equation (53) there, and is that only those 
transitions can take place in which the energy H changes by a single quantum hw. 

We shall now obtain the selection rules for m, and k for an electron moving 
in a central field of force. The components of electric displacement are here 
proportional to the Cartesian coordinates x, y, z. Taking first m,, we have that 
m, commutes with z, or that 


Mm,z—- zm, = 0. 
This is an equation of the required type (28), giving us the selection rule 
m,—m =0 
for the z-component of the displacement. Again, from equations (4) we have 


[m., [mz, @]] = [mz y] = mL 
or mr —2m,7m,+ am —hr = 0, 
which is also of the type (28) and gives us the selection rule 
m.? — Imm! + ml? — h? =0 

or (m!, — m'!! — h)(m!, —m! +h) =0 
for the x-component of the displacement. The selection rule for the y-component 
is the same. Thus our selection rules for m, are that in transitions associated with 
radiation with a polarization corresponding to an electric dipole in the z-direction, 
m’, cannot change, while in transitions associated with a polarization corresponding 
to an electric dipole in the x-direction or y-direction, m'!, must change by +h. 

We can determine more accurately the state of polarization of the radiation 


associated with a transition in which m’, changes by +h, by considering 
the condition for the non-vanishing of matrix elements of x+7y and x—7y. We have 


[mz,2 + ty] = y — ix = —i(x + ty) 
or m,(x + ty) — (x +iy)(m, + h) =0, 


which is again of the type (28). It gives 


m, —m! —-h=0 


42. Selection Rules 153 


as the condition that (m/,|a + iy|m%) shall not vanish. Similarly, 
m,—m+h=0 
is the condition that (m‘,|~ — iy|m”) shall not vanish. Hence 


(m!,|a — iylm!, — h) =0 


or (m,|a|m, — h) = i(m:|ylm, — h) = (a + ibe 


say, a, b, and w being real, and similarly 


(m!, — hx|m!,) = —i(m!, — hlylm!) = (a — ibje™" 


Thus the vector 4{(m/,|D|m/, — h) + (m!, — h|D|m/,)}, which determines the state 
of polarization of the radiation associated with transitions for which m= m/,—h, 
has the following three components 


3{(m]a|m’, — h) + (mi, — hlx|m2)} = 3{(a + ibe’ + (a — ibe} 

acoswt — bsinwt, 

4i{—(a + ib)e™’ + (a—ib)e"} p (30) 
= asinwt + bcosut, 

5{(m,|z|m, — h) + (m, — hlz|m_)} = 0. 


a{(m,|ylm2, — h) + (mz, — Aly|m2)} 


From the form of these components we see that the associated radiation moving 
in the z-direction will be circularly polarized, that moving in any direction 
in the xy plane will be linearly polarized in this plane, and that moving in 
intermediate directions will be elliptically polarized. The direction of circular 
polarization for radiation moving in the z-direction will depend on whether w 
is positive or negative, and this will depend on which of the two states m/, or 
m', = m’, — h has the greater energy. 
We shall now determine the selection rule for k. We have 
[k(k + A), 2] = [m?, 2] + [m?, 2 

= -YMy — MeY + LMy + MyL 

= 2(myx — mz,y + thz) 

=e —ym,) = 2am, = mag). 
Similarly, [A(k + h), x] = 2(ym, — myz) 
and [K(k +h), y] = 2(mzz — xm,). 


154 VII. MOTION IN A CENTRAL FIELD OF FORCE 


Hence 


[A(k + hi), [k(k + hy), z]] 
= 2[k(k+ h), myx — mzy + thz] 
= 2m, [k(k + h), 2] — 2mz[k(k + A), y] + Wh[k(k + A), z] 
= 4m, (ym, — myz) — 4m,(m,z — em,) + 2{k(k + h)z — zk(k + Aj} 
= A(mzrt+myy+m,z)m,—4(m24 ms, tm?)z+2{k(k+h)z—zk(k+h)}. 


ax ! 


The first term here vanishes, from (3), leaving us with 


[k(k + h), [k(A + hf), 2]] = —2{k(k + A)z + 2k(k + A)}, 


which gives 
k°(k+h)?z—2k(k-+h)zk(k-+h)+zk?(k+h)?—2h7{k(k-+h)z+zk(k-+h)} = 0. (31) 


Similar equations hold for z and y. These equations are of the required type (28), 
and give us the selection rule 


k?(k' +h)? —2k'(k' +A)k"(k" +h) +k? (kh +h)? — 287k! (k' +h) — 287k" (k" +h) = 0. 
which reduces to 
(k’ + kl” + 2h)(k! + kk’) (k! — ke" + h)(k’ — bk" — hh) = 0. 


A transition can take place between two states k’ and k” only if one of these four 
factors vanishes. 

Now the first of the factors, (k’+k"”+2h), can never vanish, since the eigenvalues 
of k are all positive or zero. The second, (k’ + k”), can vanish only if k’ = 0 and 
k” = 0. But transitions between two states with these values for k cannot occur on 
account of the selection rule for m,, as may be seen from the following argument. 
If two states (labelled respectively with a single prime and a double prime) are 
such that k’ = 0 and k” = 0, then, according to the discussion at the end 
of §39, each Cartesian component of the angular momentum must vanish for each 
of them, ie. mi, = mi, = m, = 0 and mi, = mj = mf = 0. The selection 
rule for m, now shows that the matrix elements of x and y referring to the two 
states must vanish, as the value of m, does not change during the transition, 
and the similar selection rule for mz or m, shows that the matrix element of z also 
vanishes. Thus transitions between the two states cannot occur. Our selection 
rule for k now reduces to 


(ki — k" + )(k’ — k” — hf) =0, 


155 


showing that k must change by +h. This selection rule may be written 
k 2 _ Ok! ke!’ a ki”? _ h = 0 


and since this is the condition that a matrix element (k’|z|k”) shall not vanish, 
we get the equation 


ke Dhak tok ie =0 
or [k, [k, z]] = —z, (32) 


a result which could not easily be obtained in a more direct way. 


43. The Zeeman Effect for the Hydrogen Atom 


We shall now consider the system of a hydrogen atom in a uniform magnetic field. 
The Hamiltonian (1) with V = —e?/r, which describes the hydrogen atom in no 
external field, gets modified by the magnetic field, the modification, according to 
classical mechanics, consisting in the replacement of the components of momentum, 
Pu, Py; Pz, bY Px + (e/c)Az, Py + (e/c)Ay & pz + (e/c)Az, where Az, Ay, Az are 
the components of the vector potential describing the field. For a uniform field of 
magnitude # in the direction of the z-axis we may take A, = —37#@y, Ay = 342, 
A, =0. The classical Hamiltonian will then be 


This classical Hamiltonian may be taken over into the quantum theory if we add on 
to it a term giving the effect of the spin of the electron. According to experimental 
evidence and according to the theory of Chapter XII, the electron has a magnetic 
moment —(eh/2mc)o where o is a vector with the properties given in §19. 
The energy of this magnetic moment in the magnetic field will be (eh#’/2mc)o,. 
Thus the total quantum Hamiltonian will be 
1 le . le : ec ehH 

n= 4 (me 550) (+5 core) vith 24 9%, (33) 
There ought strictly to be other terms in this Hamiltonian giving the interaction 
of the magnetic moment of the electron with the electric field of the nucleus of 
the atom, but this effect is small, of the same order of magnitude as the relativistic 
variation of the mass of the electron with its velocity, and will be neglected here. 
It will be taken into account in the relativistic theory of the electron given in 
Chapter XII. 

If the magnetic field is not too large, we can neglect terms involving #7, so that 
the Hamiltonian (33) reduces to 


magnetic moment 
of the electron 


magnetic anomaly 
of the spin 


156 VII. MOTION IN A CENTRAL FIELD OF FORCE 
1 ee eH RH 
ee Die IO: are 5 ns EE, sp ee oe 
ee ee ee ee | 
_ mM (Ds Dy Pz) r malin I ho:). (34) 


The extra terms due to the magnetic field are now (e#/2mc)(m,z + hoz). 
But these extra terms commute with the total Hamiltonian and are thus 
constants of the motion. This makes the problem very easy. The stationary 
states of the system, i.e. the eigenstates of the Hamiltonian (34), will be those 
eigenstates of the Hamiltonian for no field that are simultaneously eigenstates 
of the observables m, and o,, or at least of the one observable m, + hoz, 
and the energy-levels of the system will be those for the system with no field, 
given by (27) if one considers only closed states, increased by an eigenvalue of 
(eH /2mc)(m, + ho,). Thus stationary states of the system with no field which 
are spatially quantized in the z-direction, (ie. for which m, has the numerical 
value m‘,, an integral multiple of A) and for which also o, has the numerical value 
of = +1, will still be stationary states when the field is applied. Their energy will 
be increased by an amount consisting of the sum of two parts, a part (e#/2mc)m’, 
arising from the orbital motion, which part may be considered as due to an orbital 
magnetic moment —em!,/2mc, and a part! (e#/2mc)ho! arising from the spin. 
The ratio of the orbital magnetic moment to the orbital angular momentum m’, is 
—e/2mc, which is half the ratio of the spin magnetic moment to the spin angular 
momentum. This fact is sometimes referred to as the magnetic anomaly of the spin. 

Since the energy-levels now involve m,, the selection rule for m, obtained 
in the preceding section becomes capable of direct comparison with experiment. 
We take a Heisenberg representation in which, among other constants of 
the motion, m, and o, are diagonal. The selection rule for m, now requires 
m, to change by h, 0, or —h, while o,, since it commutes with the electric 
displacement, will not change at all. Thus the energy difference between the two 
states taking part in the transition process will differ by an amount eh /2mc, 0, 
or —eh” /2mc from its value for no magnetic field. Hence, from Bohr’s frequency 
condition, the frequency of the associated electromagnetic radiation will differ 
by e#/4amc, 0, or —e#/4amc from that for no magnetic field. This means 
that each spectral line for no magnetic field gets split up by the field into three 
components. If one considers radiation moving in the z-direction, then from (30) 
the two outer components will be circularly polarized, while the central undisplaced 
one will be of zero intensity. These results are in agreement with experiment and 
also with the classical theory of the Zeeman effect. The agreement with the classical 
theory ceases, however, when one takes into account relativistic mechanics and the 
interaction of the spin with the electric field of the nucleus. 


‘Original has a h incorrectly.] 


157 


44. Combination of Angular Momenta 


Suppose we have two particles moving in the central field of force, having as angular 
momenta the vectors m and pw. We can introduce the dynamical variables k and x, 
defined by (12) and 

K+ gn = (U2 + oe + 2 + ph )A 
respectively, to describe the magnitudes of these vectors. The total angular 
momentum will then be the vector M = m+ p, whose magnitude K is defined by 


K+4h=(M2+M)+M?+4h)3 


Each of the dynamical variables k and « commutes with all the components of m, 
ps, and M. Thus k, «, kK will commute with each other and can be given numerical 
values simultaneously. Our problem now is to determine the possible numerical 
values for K when k and « have given numerical values. 

The easiest way of solving this problem is to suppose k and & are equal to 
two given numbers, as we can do since they commute with all the dynamical 
variables mentioned in the problem, and then to use a matrix representation in 
which m, and ju, are diagonal. We can ignore all dynamical variables describing 
the dynamical system that are not functions of the components of m and wm. 
Our matrix representation will then have only a finite number of rows and columns, 


each labelled by a number m/, having one of the values k, k — h, k — 2h,..., —k 
and a number y,, having one of the values &, K — 2h, K — 3h,..., —K. The values 
of Mi = mi,+ yp, willthen bek+«, k+K—fi, k+K—-2h,..., -—k—x«. The number 


of times each of them occurs is given by the following scheme (if we assume for 
definiteness that k > &), 


k+n«, k+w—-h, k+n—2h,..., k—k, k—K—A,..., 
1 2 ios 26+1 2heaP Ts. 
Se SO, AER, he (35) 
2644+ 1 2K... 1 


If we now make a transformation to a representation in which Kk and M,, 
are diagonal, the number of rows and columns of the matrices for which M, has 
a given value M/! must remain unaltered. AK’, K”,...are the possible values for kK, 
there will be a set of rows and columns having the M,-values Kk’, K' — h,..., 
—K"', another set having the M,-values Kk”, kK” —h,..., —K”, etc. Comparing this 
distribution of M,-values with (35), we see that the possible values for K must be? 


k+«, k+«K—-h, k+K—2h,..., —k—-k. (36) 


S[If ’value’ is taken as ’absolute magnitude’ the distribution does not have any members that 
are negative, the contribution from —k — « appearing to be the same as the contribution from 
k +4. The original ends the sequence at k — «.] 


158 VII. MOTION IN A CENTRAL FIELD OF FORCE 


This result is a quite general one applying to the combination of any two 
angular momenta, not necessarily the orbital angular momenta of two particles. 
For example, it could be applied to the orbital angular momentum and spin of 
an electron. In this case, since the spin angular momentum has the magnitude 
k = 4h, it shows that when the orbital angular momentum has the magnitude k, 
the combined angular momentum can have only one or other of the two 
magnitudes k + 3h. 

We now have a general method for dealing with complicated atomic systems. 
For an isolated system the total angular momentum M is always a constant of 
the motion, and its magnitude K together with one of its components M, will 
be two commuting constants of the motion. We try to express M as the sum of 
two angular momenta m and ye whose magnitudes k and « are constants of the 
motion. If we can do this, then we try to express either of the parts, m say, itself as 
a sum of two angular momenta, m; and mg say, whose magnitudes k, and kz are 
constants of the motion, and so on. We obtain in this way a series of constants 
of the motion M,, K, k, «, ky, ko,..., which all commute with each other and 
may, if there are enough of them, taken as defining a Heisenberg representation. 
The possible numerical values for the K, k, «,...specifying a row and column are 
restricted by the general rule (36). The energy will be some function of K, k, x, 
ky, k,..., but independent of M,. In general one cannot secure that k, k, ky, ka, 
are exactly constants of the motion, but one may be able to choose them so that 
they are approximately so and then apply a perturbation method, as discussed in 
the next chapter. 

We shall now obtain the selection rule for the magnitude K of the total 
angular momentum M of a general atomic system. Let m be the orbital angular 
momentum of one of the electrons, whose coordinates are x, y, z say, and let 
M-—m-=u. It is not necessary for the present discussion that the magnitudes k 
and « of the two angular momenta m and p into which' M has been split up should 
be constants of the motion. We must obtain the condition that the (K’, K”) matrix 
element of x, y, or z shall not vanish. This is evidently the same as the condition 
that the (K', K”) matrix element of Ay, A2, or A3 shall not vanish, where ,, Az, 
and A3 are any three independent linear functions of x, y and z with numerical 
coefficients, or more generally with any coefficients that commute with K and are 
thus represented by matrices which are diagonal with respect to K. Let 


Ag = M,x2+ M,y + Mzz, 
Ae = Mye = Moy — the, 
Ay = Ma — Mzz — thy, 
Az = Mzy — Myx — thz. 


‘Original has M_] 


44. Combination of Angular Momenta 159 


We have 


MzAz + Mydy + Md, = >-(MzMyz — MyMzy — ihMz2) 
LYyz 
= 5 /(M,M, — M,M, — ihM,)z = 0 (37) 
LYyz 
from the general condition (11) for angular momentum. Thus A,, A, and A, 
are not linearly independent functions of x, y and z. Any two of them, however, 
together with Ap are three linearly independent functions of x, y and z and may 
be taken as the above A;, Ay, Az, since the coefficients M,, M,, M, all commute 
with AK. Our problem thus reduces to finding the condition that the (AK Kk”) 
matrix elements of Ao, Az, Ay and A, shall not vanish. The physical meanings of 
these X’s are that Ag is proportional to the component of the vector (x,y,z) in 
the direction of the vector M, and A,, Ay, Az are proportional to the Cartesian 
components of the component of (x,y, z) perpendicular to M. 
From (4) together with the condition that x, y and z commute with yp. 
we obtain 


(38) 


[M.,x] = |m.+ 2,2] =y 
[M.,y] = —2 [M,.2| = 0: 


Hence 


[M., Ao] = [M., M,]x ate M,[Mz, 2] at [M., Myly 5s M,[Mz, | 
= M,x+ Mzy — Mzy — Myx = 0. 


Thus Ap commutes with M,, and from symmetry it must commute also with M, 

and M,, so that it must commute with Kk. It follows that only the diagonal 

elements (K’|\o|/’) of Ao can differ from zero, so the selection rule is that kK 

cannot change so far as this component of the electric displacement is concerned. 
With further applications of (38) we obtain 


Mz, Az| = (Mz, Mylz — M2|Mz, y] — tA[M,, x 

= —M,z+ M,2 —ihy =, 

M.,, Ay| = M_[Mz, 2] — [Mz, Mz|z — th[M,, y 

= My — Myz+ thax = —d;z 

Mz, rz] = [Mz, Mz|y + M,[Mz, y] — (M2, M,]2 — M,|M., 2] 
= M,y — M,x + M,x2 — M,y = 0. 


These relations between M, and Az, Ay & Az are of exactly the same form as 
the relations (4) & (5) between m, and «x, y, z, and also (37) is of the same form 
as (3). The dynamical variables \,,  & Az thus have the same properties relative 


160 VII. MOTION IN A CENTRAL FIELD OF FORCE 


to the angular momentum M as 2, y, z have relative to m. The deduction in §42 
of the selection rule for k when the electric displacement is proportional to (2, y, z) 
can therefore be taken over and applied to the selection rule for K when the electric 
displacement is proportional to (Az, Ay, Az). We find in this way that, so far as Az, 
Ay & A, are concerned, the selection rule for kK is that it must change by +h. 

Collecting results, we have as the selection rule for K that it must change by 
O or th. We have considered the electric displacement produced by only one of 
the electrons, but the same selection rule must hold for each electron and thus also 
for the total electric displacement. 


VII. PERTURBATION THEORY 


45. General Remarks 


IN the preceding chapter exact treatments were given of some simple dynamical 
systems in the quantum theory. Most quantum problems, however, cannot be 
solved exactly with the present resources of mathematics, as they lead to equations 
whose solutions cannot be expressed in finite terms with the help of the ordinary 
functions of analysis. For such problems one can often use a perturbation method. 
This consists in splitting up the Hamiltonian into two parts, one of which 
must be simple and the other small. ‘The first part may then be considered 
as the Hamiltonian of a simplified or unperturbed system, which can be dealt 
with exactly, and the addition of the second will then require small corrections, 
of the nature of a perturbation, in the solution for the unperturbed system. 
The requirement that the first part shall be simple requires in practice that it shall 
not involve the time explicitly. If the second part contains a small numerical 
factor €, we can obtain the solution of our equations for the perturbed system in 
the form of a power series in €, which, provided it converges, will give the answer 
to our problem with any desired accuracy. Even when the series does not converge, 
the first approximation obtained by means of it is usually fairly accurate. 

There are two distinct methods in perturbation theory. In one of these 
the perturbation is considered as causing a modification of the states of motion 
of the unperturbed system (with the space-time meaning of ‘states’). In the 
other we do not consider any modification to be made in the states of the 
unperturbed system, but we suppose that the perturbed system, instead of 
remaining permanently in one of these states, is continually changing from one 
to another, or making transitions, under the influence of the perturbation. Which 
method is to be used in any particular case depends on the nature of the problem 
to be solved. The first method is useful usually only when the perturbing energy 
(the correction in the Hamiltonian for the undisturbed system) does not involve 
the time explicitly, and is then applied to the stationary states. It can be 
used for calculating things that do not refer to any definite time, such as the 
energy-levels of the stationary states of the perturbed system, or, in the case of 


4{No other methods are considered and the work was published before ready access to 
digital computers.| 


161 


162 VII. PERTURBATION THEORY 


collision problems, the probability of scattering through a given angle. The second 
method must, on the other hand, be used for solving all problems involving 
a consideration of time, such as those about the transient phenomena that occur 
when the perturbation is suddenly applied, or more generally problems in which 
the perturbation varies with the time in any way (i.e. in which the perturbing 
energy involves the time explicitly in an arbitrary way). Again, this second method 
must be used in collision problems, even though the perturbing energy does not 
here involve the time explicitly, if one wishes to calculate absorption and emission 
probabilities, since these probabilities, unlike a scattering probability, cannot be 
defined without reference to a state of affairs that varies with the time. 


46. The change in the energy-levels caused by 
a perturbation 


The first of the above-mentioned methods will now be applied to the calculation of 
the changes in the energy-levels of a system caused by a perturbation. We assume 
the perturbing energy, like the Hamiltonian for the unperturbed system, 
not to involve the time explicitly. Our problem has a meaning, of course, 
only provided the energy-levels of the unperturbed system are discrete and 
the differences between them are large compared with the changes in them 
caused by the perturbation. This fact results in the treatment of perturbation 
problems by the first method having some different features according to whether 
the energy-levels of the unperturbed system are discrete or continuous. 
Let the Hamiltonian of the perturbed system be 


H=H+V, (1) 


Hy being the Hamiltonian of the unperturbed system and V the small perturbing 
energy. By hypothesis each eigenvalue H’ of H lies very close to one and only 
one eigenvalue Hj of Ho. We shall use the same number of primes to specify any 
eigenvalue of H and the eigenvalue of Hp to which it lies very close. Thus we shall 
have H” differing from Hj by a small quantity of order V and differing from H{ 
by a quantity that is not small unless Hj = Hj. We must now take care always 
to use different numbers of primes to specify eigenvalues of H and Hp which we do 
not want to lie very close together. 
We have to solve the equation 


Ay = {Hy +V}~ = Hy 
or {H' — Ho}b = Vv. (2) 


Let Wo be an eigen-7 of Ho belonging to the eigenvalue Hj and suppose the 
and H’ that satisfy (2) to differ from w»9 and Hj only by small quantities and to be 


46. The change in the energy-levels caused by a perturbation 163 


expressed as 


(3) 


~y=Yotyityveat--- 
H’=Hj+aq ag Sey 
where vw and a, are of the first order of smallness (i.e. the same order as V), 


w2 and ag are of the second order, and so on. Substituting these expressions 
in (2), we obtain 


{Hy — Ho +a, +a2+--- {pot dit det: }=V{yo+yi+--:f- 


If we now separate the terms of zero order, of the first order, of the second order, 
and so on, we get the following set of equations, 


{Ho — Ho} = 0 


{Ho = Ao}u1 + a1Wo = Vivo 
{Ho — Ho}we + ary + aot = Vor 


(4) 


The first of these equations tells us, what we have already assumed, that wo is 
an eigen-y of Hy belonging to the eigenvalue Hj. The others enable us to calculate 
the various corrections Wy , We,..., @1, A2,.... 

For the further discussion of these equations it is convenient to introduce 
a representation in which Ho is diagonal, i.e. a Heisenberg representation for 
the unperturbed system, and to take Hp itself as one of the observables whose 
eigenvalues label the representatives. Let the others, in the event of others being 
necessary, as is the case when there is more than one eigenstate of Hy belonging 
to any eigenvalue, be called 6’s. The representatives of w, Wo, w1, We are then 
(H0.0"|), (HG0"|0), (AG 0" 1), (4002) respectively. Since qo is an eigen-q of Ho 
belonging to the eigenvalue Hj, we have 


(Hp 8" |0) = dyn, (8"0), (5) 


where (8”|0) is some function of the variables 3”. With the help of this result 
the second of equations (4), written in terms of representatives, becomes 


{Hi — HY}(HEB"|1) + adxig (810) = SoHE BWV IHA8(8"10). (6) 
B' 
Putting Hj = Hj here, we get 
a1(8"10) = )_(H98"|V|H06")(8"|0). (7) 


Bp’ 


degenerate system 


non-degenerate 
system 


164 VIII. PERTURBATION THEORY 


Equation (7) is of the form of the standard equation in the theory of eigenvalues, 
so far as the variables 3’ are concerned. It shows that the various possible values 
for a; are the eigenvalues of the matrix (H8”|V|H96’). This matrix is a part 
of the representative of the perturbing energy in the Heisenberg representation 
for the unperturbed system, namely, the part consisting of those elements that 
refer to the same unperturbed energy-level Hj, for their row and column. Each of 
these values for a, gives, to the first order, an energy-level of the perturbed system 
lying close to the energy-level Hj, of the unperturbed system? There may thus be 
several energy-levels of the perturbed system lying close to the one energy-level H{ 
of the unperturbed system, their number being anything not exceeding the number 
of independent states of the unperturbed system belonging to the energy-level H}. 
In this way the perturbation may cause a separation or partial separation of 
the energy-levels that coincide at Hj} for the unperturbed system. 

Equation (7) also determines, to the zero order, the representatives (H{3”"|0) of 
the stationary states of the perturbed system belonging to energy-levels lying close 
to Hj}, any solution (6’|0) of (7) substituted in (5) giving one such representative. 
Each of these stationary states of the perturbed system approximates to one of 
the stationary states of the unperturbed system, but the converse, that each 
stationary state of the unperturbed system approximates to one of the stationary 
states of the perturbed system, is not true, since the general stationary state 
of the unperturbed system belonging to the energy-level Hj is represented by 
the right-hand side of (5) with an arbitrary function (6”|0). The problem of finding 
which stationary states of the unperturbed system approximate to stationary 
states of the perturbed system, i.e. the problem of finding the solutions (/5’|0) 
of (7), corresponds to the problem of secular perturbations in classical mechanics. 
It should be noted that the above results are independent of the values of all those 
matrix elements of the perturbing energy which refer to two different energy-levels 
Hj and H¢ of the unperturbed system. 

Let us see what the above results become in the specially simple case when 
there is only one stationary state of the unperturbed system belonging to each 
energy-level! In this case Ho alone fixes the representation, no §’s being required. 
The sum in (7) now reduces to a single term and we get 


a, = (Ho|V |). (8) 


?To distinguish these energy-levels one from another we should require some more elaborate 
notation, since according to the present notation they must all be specified by the same number 
of primes, namely, by the number of primes specifying the energy-level of the unperturbed 
system from which they arise. For our present purposes, however, this more elaborate notation 
is not required. 

+A system with only one stationary state belonging to each energy-level is often called 
non-degenerate and one with two or more stationary states belonging to an energy-level is called 
degenerate, although these words are not very appropriate from the modern point of view. 


46. The change in the energy-levels caused by a perturbation 165 


There is only one energy-level of the perturbed system lying close to any 
energy-level of the unperturbed system and the change in energy is equal, 
in the first order, to the corresponding diagonal element of the perturbing energy 
in the Heisenberg representation for the unperturbed system, or to the average 
value of the perturbing energy for the corresponding unperturbed state. The latter 
formulation of the result is the same as in classical mechanics when the unperturbed 
system is multiply periodic. 

We shall proceed to calculate the second-order correction a2 in the energy-level 
for the case when the unperturbed system is non-degenerate. Equation (5) for this 
case reads 

(F19|0) = dere, 
with neglect of an unimportant numerical factor, and equation (6) 
{Ho — Ho} (Ho|1) + aiduy ny, = (Ho|V |), 


This gives us a value for (Hj|1) when Hj 4 Hj, namely 


(A|V |) 
(Ho|1) = “HH (9) 


The third of equations (4), written in terms of representatives, becomes 


{Hj — HG}(Hg|2) + a1(Hgl1) + a2dn0ny = S— (He |V | Ho") (Ho"|1). 
Hi!’ 


Putting Hj = Hj here, we get 


ay(Ho|1) + a2 = >> (AgIV| Ao’) (Ho'l), 
Hf’ 
which reduces, with the help of (8), to 
a2= > (AgIV|Ho)(Hol1). 
HY AHS 
Substituting for (Hj|1) from (9), we obtain finally 


3 (HolV |Ho)(HolV Ho) 
Hi — He 


Hy AH 
giving for the total energy change to the second order 


(HolV |Ho) (Ho lV Ho) 


a, + Ag = (Ho|V|Ho) 5 ys HH! — oH” 
0 0) 


HAH 


(10) 


166 VII. PERTURBATION THEORY 


The method may be developed for the calculation of the higher approximations 
if required. General recurrence formulas giving the nth order corrections in terms 
of those of lower order have been obtained by Max Born, Werner Heisenberg and 
Pascual Jordan. 


47. The perturbation considered as causing 
transitions 


We shall now consider the second of the two perturbation methods mentioned 
in §45. We suppose again that we have an unperturbed system governed by 
a Hamiltonian H, which does not involve the time explicitly, and a perturbing 
energy V which can now be an arbitrary function of the time. The Hamiltonian 
for the perturbed system is again H = Hj)+V. For the present method it does not 
make any essential difference whether the energy-levels of the unperturbed system, 
i.e. the eigenvalues of Hp, form a discrete or continuous set. We shall, however, 
take the discrete case for definiteness. 

We shall again work with a Heisenberg representation for the unperturbed 
system, but as there will now be no advantage in taking Ho itself as one of 
the observables whose eigenvalues label the representatives, we shall suppose 
we have a general set of a’s to label the representatives. The representative of 
Hy will be diagonal, of the form 


(a'| Hola") = Ho dara, (11) 


like (14) of §32. We shall have to make use of both the representations considered 
at the end of §32, differing one from the other with regard to the phase factors 
and fixed in the Heisenberg and Schrodinger pictures respectively. Equation (11) 
holds for both. As in §32, we shall use stars to distinguish representatives 
in the representation which is fixed in the Schrédinger picture. The two 
representatives of a w are connected by 


(a')* =e 0/F(a), (12) 


Heisenberg, ,Uber quantentheoretische Umdeutung kinematischer und mechanischer 
Beziehungen* Zeitschrift fiir Physik, 33, 1925, pp. 879-893.  [doi: 10.1007/BF01328377 | 
With: Born and Jordan ,,Zur Quantenmechanik“ Zeitschrift fiir Physik, 34, 1925, 
pp. 858-888. [doi: 10.1007/BF01328531 | With: Born, Heisenberg and Jordan ,,Zur 
Quantenmechanik II“ Zeitschrift fiir Physik, 35, 1926, pp. 557-615. [doi: 10.1007/BF01379806 | 


47. The perturbation considered as causing transitions 167 


like equation (17) of §32. The Schrédinger wave equation, which holds with 
the starred representatives, reads 


r d / N\ x N\\* 
nS al)” = Yo(0!|Ho + Via")*(a") 


” 


Hi(a'|)* + $“(a!|Vla")*(a"|)* (13) 


all 


a 


The representative (a’|) of a state will not satisfy the Schrédinger equation, but will 
satisfy instead the following equation, obtained by substituting (12) in (13), 


ah i Hing Lg tat? (a) !) ferininn (a! D 


= = Hhe ee Ll (a’|) + Sia (a’'|Vla")*e —iHg ot/R (cy! ). 


al’ 


This reduces to 
d : 7 wr 
ih—(a’]) i S| elo Ho HB! |Vial")* (a ) 
=) (a'|Vla”)(a")). (14) 


The representative (a’|V|a”)* of the perturbing energy V does not depend on t, 
except in so far as V itself involves t explicitly, but the representative (a‘|V|a”) 
appearing in our equation (14) varies rapidly with t, according to the Heisenberg 
law e'(4o—45)t/h when one neglects the explicit dependence of V on t. 

Equation (14) is the fundamental equation of the present method in 
perturbation theory. It is an exact equation, no use having yet been made of 
the fact that the perturbation is small. It shows how the representative of a state 
of a perturbed system varies with the time when the representation is chosen so 
that the whole of this variation is caused by the perturbation, and thus expresses 
most clearly the way in which the perturbation may be considered as causing 
a continual change in the state of the system. At any instant the probability of 
the a’s having specified values a’ is 


P= al)" = |e")? (15) 


provided P’ is normalized. 

We shall now obtain an approximate solution to equation (14) for a given 
initial value of the representative (a’|) of the state. Since V is small, the rate of 
change of (a’|) is small and (a’|) remains approximately equal to its initial value, 


168 VII. PERTURBATION THEORY 


at any rate for times that do not differ too much from the initial time. We can thus 
obtain a first approximation by substituting for (a”|) in the right-hand side of (14) 
its initial value and then performing a simple integration. We may then obtain 
a second approximation by substituting the first approximation in the right-hand 
side of (14), and so on indefinitely. 

Let the initial value of (a‘]), i.e. the value at time t = 0, be ao(a’), or ap say, 
for brevity. We shall then have in the first approximation for the value of (a’|) at 
an arbitrary time 7, 


(a’|), =a, —ih! a (a’|Va"), ag dt 
Ql!’ 0 


=a) + a1, 
say, a’, being the first-order correction, whose value at time 7 is 
a = th! Ya [ (a!|V|a"), dt. (16) 
all 0 
The second approximation at an arbitrary time 7’ will now be 
ie 
(o!)r = ab — ano >| (a'|V|o"),[a” +a dr 
0 


al 


/ / / 
=A) + Ap + Agr, 
where az, the second-order correction, has the value at time T 


T 
Ayr = —ih aa (a’"|Via"),at dr 
0 


all 


T T 
=f? > a f (a’|V a”), ar | (a |Vla'"), dt. (17) 
0 0 


a”, all 


The probability (15) of the a’s having the values a’ at any time is now, 
to the second order of accuracy, 


P' 


(ay + a, + a4)(@ + @, + @) 
= app + (aH + a) + (aM + aha + AGT) +--+ ee 
Spa Py Py shee 


P) being the initial value of this probability and P; and P; being the first- and 
second-order corrections. 


47. The perturbation considered as causing transitions 169 


Suppose now that we are given, not the initial value aj of (a’|), but only 
the initial probability Pj of the a’s having any specified values a’, and want 
to calculate the probability at any subsequent time of the a’s having specified 
values. We now know only the modulus of aj and not its phase, so that we must 
average over all phases. This averaging results in a considerable simplification in 
the expression (18) for P’, since this expression is bilinear in ap and Gp [both a, 
and a2 being linear functions of ag according to (16) and (17), and thus consists of 
a sum of terms of the form ajaj. The average of ajaj’ or ao(a”)ao(a’”) will vanish 
except when a” = a’, so that the only surviving terms will be those of the form 


ajay. In this way P; at time 7 reduces to 


Php = 4,My + a9, 
= [ina [ vie’ it| A +a inet [ewie’s at] 
ae 0 0 
Similarly, PS at time T reduces to 
Pop = AypAy + yp Typ + AAyp 
= hapa, >~ [vie arf (a"WVla) dt 


2 
a h- aa! 
00 


all 


of be 
— Wh agay S- | (a Va’), dr il (a!|Va"), dt, 
0 


= 
0 


r 
i: (a’"|Vla"), dt 
0 


al’ 


use being made, in dealing with the third term, of the fact that the matrix (a’|V ja”) 
is Hermitian. If we interchange ¢ and 7 in this third term, we can combine it with 
the first term to give 


7 T fi t 
-r* 1a? | f inf a+ | af ar | (a'\Vla")-(a"\Vo) 
° i T 
=-w 1? | ar | dt (a! |Vja"),(a!"|Va’)s 
Q!! 0 0 


T 2 
ra? | f (a'|Vja"), dt] . 


170 VII. PERTURBATION THEORY 


Thus our expression for P; becomes 
2 


dB 
in =H? lal? — [a 7} i (a'|Vla"), dt 


2 


T 
=f? ex — Po} | (a’|V\a"), dt 
a! 0 


and the probability P’ of the a’s having the values a’ is, to the second order 
of accuracy, 
T 2 
PH Pith yet — Pi} i (a’|Vja"), dt| . (19) 
0 


" 


a 


This result is capable of a simple interpretation. If we suppose that initially 
the a’s certainly have the values a” 4 a’, so that PY = 1, Pj’ = 0 for a” F a” 
(in which special case the averaging over the phases of the ao’s is not necessary), 
then the right-hand side of (19) reduces to the single term 


2 


T 
he i (a’|Via"), dt| = P(a", a’) (20) 
0 


say. This may be interpreted as the probability of the system making a transition 
from the state a” to the state a’ under the influence of the perturbation V during 
the interval of time 0 to T. It is symmetrical between a” and a’. Returning 
now to the general case, we see that (19) may be regarded as expressing that 
the change in the probability of the a’s having the values a’ during the time 
interval 0 to T, namely P>—P%, is made up of the total probability 5°, Py P(a’, a’) 
of the system jumping into the state a’ from some other state a”, minus the total 
probability Pj >>, P(a’, a”) of its jumping out of the state a’, during this time 
interval. Thus the ordinary laws of probability apply, showing that there is no 
interference between the different transition processes. If we had not averaged 
over the initial phases, there would have been such interference. 

The integrand in (20) is the representative in a certain representation of 
the perturbing energy at time t. This representation is one that is approximately 
fixed in the Heisenberg picture, since if we put V = 0 it would then be completely 
fixed in the Heisenberg picture. Hence we can, without spoiling the order of 


is V, dt | a") and obtain 


accuracy of our result, replace the integral in (20) by (a 
an alternative expression for the transition probability 


T 
(«'| V, dt a’) 
0 


2 


Plat, ey nh * (21) 


171 


This provides a simple physical meaning for the non-diagonal elements of 
the matrix representing a dynamical variable if this dynamical variable can be 
regarded as the time integral of a perturbing energy. 


48. Application to Radiation 


In the preceding section a general theory of the perturbation of an atomic system 
was developed, in which the perturbing energy could vary with the time in 
an arbitrary way. A perturbation of this kind can be realized in practice by 
allowing incident electromagnetic radiation to fall on the system. Let us see what 
our result (19) or (20) reduces to in this case. 

If we neglect the effects of the magnetic field of the incident radiation, and if 
we further assume that the wave-lengths of the harmonic components of this 
radiation are all large compared with the dimensions of the atomic system, 
then the perturbing energy is simply the scalar product 


V =(D,@), (22) 


where D is the total electric displacement of the system and @ is the electric 
force of the incident radiation. We suppose @ to be a given function of the time. 
If we take for simplicity the case when the incident radiation is plane polarized with 
its electric vector in a certain direction and let D denote the Cartesian component 
of D in this direction, the expression (22) for V reduces to the ordinary product 


V=Dé 
where & is the magnitude of the vector &. The matrix elements of V are 
(a’|Vla") = (a'|Dia")é, 


since & is a number. Now (a‘|D\a”) varies with the time t according to 
the Heisenberg law 


(a’|D\a’) = (a’| Dla") et Fo— Ao t/h 


(a’|D\a")o being constant, and hence our expression (20) for the transition 
probability becomes 
2 


T 
P(al, al") = A-2 |(a’| Dla”) |? / &,e'Fo— Ho )t/h dt| . (23) 
0 


If the incident radiation during the time interval 0 to T’ is resolved into its 
Fourier components, the energy crossing unit area per unit frequency range about 
the frequency v will be, according to classical electrodynamics, 

2 


Z (24) 


T 
aes &erm™ dt 
27 / ; 


Bohr’s frequency 
condition 


stimulated 
emmision 


172 VII. PERTURBATION THEORY 


Comparing this with (23), we obtain 


P(a!, a") = 2nc7 1h? |(a’/|Dl a") |? Ey, (25) 
where vy =|Hj — Ho|/h. (26) 


From this result we see in the first place that the transition probability depends 
only on that Fourier component of the incident radiation whose frequency v is 
connected with the change of energy by (26). This gives us Bohr’s Frequency 
Condition and shows how the ideas of Niels Bohr’s atomic theory, which was 
the forerunner of quantum mechanics, can be fitted in with quantum mechanics. 

The present elementary theory does not tell us anything about the energy of 
the field of radiation. It would be reasonable to assume, though, that the energy 
absorbed or liberated by the atomic system in the transition process comes from 
or goes into the component of the radiation with frequency v given by (26). 
This assumption will be justified by the more complete theory of radiation given 
in Chapter XI. The result (25) is then to be interpreted as the probability of 
the system, if initially in the state of lower energy, absorbing radiation and being 
carried to the upper state, and if initially in the upper state, being stimulated by 
the incident radiation to emit and fall to the lower state. The present theory does 
not account for the experimental fact that the system, if in the upper state with 
no incident radiation, can emit spontaneously and fall to the lower state, but this 
also will be accounted for by the more complete theory of Chapter XI. 

The existence of the phenomenon of stimulated emission was inferred 
by Albert Einstein,’ long before the discovery of quantum mechanics, 
from a consideration of thermodynamic equilibrium between atoms and a field of 
black-body radiation satisfying Planck’s law. Einstein showed that the transition 
probability for stimulated emission must equal that for absorption between 
the same pair of states and deduced a relation connecting this transition probability 
with that for spontaneous emission, which relation is in agreement with the theory 
of Chapter XI. 

The matrix element (a’|D|a”) in (25) plays the part of the amplitude of one 
of the Fourier components of D in the classical theory of a multiply-periodic 
system interacting with radiation. In fact it was the idea of replacing classical 
Fourier components by matrix elements which led Heisenberg to the discovery of 
quantum mechanics in 1925. Heisenberg assumed that the formulae describing 
the interaction with radiation of a system in the quantum theory can be obtained 
from the classical formulae by substituting for the Fourier components of the total 
electric displacement of the system the corresponding matrix elements. According 


tEinstein, A. (1917) Physikalische Zeitschrift, 18, pp. 121-128 [Einstein, A.; B. L. van der 
Waerden (translator and editor) ‘On the quantum theory of radiation.’ Sources of Quantum 
Mechanics, North-Holland, Amsterdam, 1968] 


173 


to this assumption applied to spontaneous emission, a system having an electric 
moment D will, when in the state a’, spontaneously emit radiation of frequency 
v = (H'— H")/h, where H” is an energy-level, less than H’, of some state a”, 
at the rate 

4 (2rv)* 

Se oa 
The distribution of this radiation over the different directions of emission and 
its state of polarization for each direction will be the same as that for a classical 
electric dipole of moment equal to the real part of (a’|D|a”). To interpret this rate 
of emission of radiant energy as a transition probability, we must divide it by 
the quantum of energy of this frequency, namely hy, and call it the probability per 
unit time of this quantum being spontaneously emitted, with the atomic system 
simultaneously dropping to the state a” of lower energy. These assumptions 
of Heisenberg are justified by the present radiation theory, supplemented by 
the spontaneous transition theory of Chapter XI. 


[(0’|Da") |’. (27) 


49. Transitions caused by a Perturbation 
Independent of the Time 


The perturbation method of §47 is still valid when the perturbing energy V does 
not involve the time ¢ explicitly. Since the total Hamiltonian H in this case 
does not involve t explicitly, we could now, if desired, deal with the system by 
the perturbation method of 846 and find its stationary states. Whether this method 
would be convenient or not would depend on what we want to find out about 
the system. If what we have to calculate makes an explicit reference to the time, 
e.g. if we have to calculate the wave function at one time when we are given its 
value at another time, the method of 847 would be the more convenient one. 

Let us see what the result (20) for the transition probability becomes when V 
does not involve t explicitly. The matrix element (a’|V|a”) now varies with t 
according to the Heisenberg law and thus its time integral is 


T T 
| (a/|Vla") dt = (a'|Vla’")o i, ello“ Movi ae 
0 0 


cl HS-HO)T/A 4 


provided Hj # Hj. Thus the transition probability (20) becomes 


Plat, al") = |(al [Va [eS HOE — fe HHS HBYEIM _ CHS — HY)? 
= 2(a'[Va")P (1 = cos{(Hg — HET /A}I/(Hy — HE) (28) 


proper-energy 


174 VII. PERTURBATION THEORY 


If Hj differs appreciably from Hj this transition probability is small and 
remains so for all values of 7. This result is required by the law of the conservation 
of energy. The total energy H is constant and hence the proper-energy Ho 
(i.e. the energy with neglect of the part V due to the perturbation), 
being approximately equal to H, must be approximately constant. This means 
that if Ho initially has the numerical value Hj, at any later time there must be 
only a small probability of its having a numerical value differing considerably 
from Ho. 

On the other hand, when the initial state a’ is such that there exists another 
state a” having the same or very nearly the same proper-energy Ho, the probability 
of a transition to the final state a” may be quite large. The case of physical 
interest now is that in which there is a continuous range of final states a” having 
a continuous range of proper-energy levels Hj passing through the value H{ 
of the proper-energy of the initial state. The initial state must not be one of 
the continuous range of final states, but may be either a separate discrete state 
or one of another continuous range of states. We shall now have, remembering 
the rules of §22 for the interpretation of probability amplitudes with continuous 
ranges of states, that, with P(a‘, a”) having the value (28), the probability 
of a transition to a final state within the small range a” to a” + da” will 
be P(a’, a”)da” when the initial state a’ is discrete and will be proportional to 
this quantity when a’ is one of a continuous range. 

We may suppose that the a’s describing the final state consist of Ho itself 
together with a number of other dynamical variables 6, so that we have 
a representation like that of §46 for the degenerate case. (The §’s, however, 
need have no meaning for the initial state a’.) We shall suppose for definiteness 
that the 6’s have only discrete eigenvalues. The total probability of a transition 
to a final state a” for which the (’s have the values 6” and Ho has any value 
(there will be a strong probability of its having a value near the initial value H}) 
will now be (or be proportional to) 


[Plt ang =2 f \olIV|Aga")P (1 - cos{ Hy — HS)T/R}/(HS — Hi)? aH 
(29) 
=a Be i |(a’|V |i, + (ha /T)B")|? [1 — cos a] /x? dx 


(oe) 


if one makes the substitution (Hf — Hj)T/h = x. For large values of T this 
reduces to 


2TH |(a’|V|Hi,8"\|? ‘ [1 — cosa] /a? dx = 2nTh |(a'|V|HiB")/?. (30) 


175 


Thus the total probability up to time 7 of a transition to a final state for which 
the @’s have the values 6” is proportional to T. There is therefore a definite 
probability coefficient, or probability per unit time, for the transition process under 
consideration, having the value 


2th" |(a’|V|Ho8")|’. (31) 


It is proportional to the square of the modulus of the matrix element, associated 
with this transition, of the perturbing energy. 

In order that the approximations used in deriving (30) may be valid, the time T 
must be not too small and not too large. It must be large compared with 
the periods of the atomic system in order that the evaluation of the integral (29) 
leading to the result (30) may be correct, while it must not be excessively 
large or else the general formula (20) will break down. In fact one could make 
the probability (30) greater than unity by taking T large enough. The upper limit 
to T is fixed by the condition that the probability (20) or (30) must be small 
compared with unity. There is no difficulty in T’ satisfying both these conditions 
simultaneously provided the perturbing energy V is sufficiently small. 


50. The Anomalous Zeeman Effect 


One of the simplest examples of the perturbation method of 846 is the calculation 
of the first-order change in the energy-levels of a general atom caused by a uniform 
magnetic field. The problem of a hydrogen atom in a uniform magnetic field has 
already been dealt with in §43 and was so simple that perturbation theory was 
unnecessary. The case of a general atom is not much more complicated when 
we make a few approximations such that we can set up asimple model for the atom. 

We first of all consider the atom in the absence of the magnetic field along 
the lines mentioned in §44 and look for angular momenta that are constants of 
the motion. The total angular momentum of the atom, the vector j say, is certainly 
a constant of the motion. This angular momentum may be regarded as the sum 
of two parts, the total orbital angular momentum of all the electrons, 1 say, 
and the total spin angular momentum, s say. Thus we have j = 1+s. Now the effect 
of the spin magnetic moments on the motion of the electrons is small compared 
with the effect of the Coulomb forces and may be neglected as a first approximation. 
With this approximation the spin angular momentum of each electron is a constant 
of the motion, there being no forces tending to change its orientation. Thus s, 
and hence also 1, will be constants of the motion. We now have the three constant 
angular momenta I, s, and j, related in the same way as the m, pz, and M of §44. 


multiplet 


176 VIII. PERTURBATION THEORY 


The magnitudes, /, s and j say, of these angular momenta will be given by 
1+4h=(B+2 +2440) 
s+$h=(si+s)4 

Han = (2 + jy +52 + AY 


jo 


corresponding to equation (12) of Chapter VII. They commute with each other, 
and from (36) of that chapter we see that with given numerical values for / and s 
the possible numerical values for 7 are 


I+s, l+s-f, ..., |l—s]. 


Let us consider a stationary state for which /, s and 7 have definite numerical 
values in agreement with the above scheme. The energy of this state will depend 
on J, but one might think that with neglect of the spin magnetic moments it 
would be independent of s, and also of the direction of the vector s relative to 1, 
and thus of 7. It will be found in Chapter X, however, that the energy depends very 
much on the magnitude s of the vector s, although independent of its direction 
when one neglects the spin magnetic moments, on account of certain phenomena 
arising from the fact that the electrons are indistinguishable one from another. 
There are thus different energy-levels of the system for each different value of 1 
and s. This means that / and s are functions of the energy, according to the general 
definition of a function given in §11, since the / and s of a stationary state are 
fixed when the energy of that state is fixed. 

We can now take into account the effect of the spin magnetic moments, 
treating it as a small perturbation according to the method of §46. The energy 
of the unperturbed system will still be approximately a constant of the motion 
and hence / and s, being functions of this energy, will still be approximately 
constants of the motion. The directions of the vectors | and s, however, not being 
functions of the unperturbed energy, need not now be approximately constants of 
the motion and may undergo large secular variations. Since the vector j is constant, 
the only possible variation of 1 and s is a precession about the vector j. We thus 
have an approximate model of the atom consisting of the two vectors 1 and s of 
constant lengths precessing about their sum j, which is a fixed vector. The energy 
is determined mainly by the magnitudes of 1 and s and depends only slightly on 
their relative directions, specified by 7. Thus states with the same / and s and 
different j will have only slightly different energy-levels, forming what is called 
a multiplet term. 

Let us now take this atomic model as our unperturbed system and suppose 
it to be subjected to a uniform magnetic field of magnitude # in the direction of 
the z-axis. The extra energy due to this magnetic field will consist of a term 


50. The Anomalous Zeeman Effect 177 


(eH /2mc)(m, + hoz), (32) 


like the last term in equation (34) of Chapter VII, contributed by each electron, 
and will thus be altogether 


(eH /2mc) Si(m: + hoz) = (e#/2mc)(l, + 28.) = (eH /2mc) (jz + 82). (33) 


This is our perturbing energy V. We shall now use the method of §46 to determine 
the changes in the energy-levels caused by this V. The method will be legitimate 
only provided the field is so weak that V is small compared with the energy 
differences within a multiplet. 

Our unperturbed system is degenerate, on account of the direction of 
the vector j being undetermined. We must therefore take, from the representative 
of V in a Heisenberg representation for the unperturbed system, those matrix 
elements that refer to one particular energy-level for their row and column, 
and obtain the eigenvalues of the matrix thus formed. We can do this best by first 
splitting up V into two parts, one of which is a constant of the unperturbed motion, 
so that its representative contains only matrix elements referring to the same 
unperturbed energy-level for their row and column, while the representative of 
the other contains only matrix elements referring to two different unperturbed 
energy-levels for their row and column, so that this second part does not affect 
the first-order perturbation. The term involving j,, in (33) is a constant of 
the unperturbed motion and thus belongs entirely to the first part. For the term 
involving s, we have 


s(jz = ay ao a.) = Je(SaJu + Syjy + S2jz) + (Seda — J28x)Jx + (S2Iy — JzSy)Fy 


or 

8, = th) —I(l+h)+s(s + h)| — —— |e - Yedy), 34 

where Veo = Boy = je8y = Sal = ley = se = les (35) 
y= spr _ S2Jn = 1 Sy _ Sabie = LS == bess 


The first term in this expression for s, is a constant of the unperturbed motion and 
thus belongs entirely to the first part, while the second term, as we shall now see, 
belongs entirely to the second part. 

Corresponding to (35) we can introduce 


Ai Sy lyse. 
It can now easily be verified that 


Jeo + dyYy + J2Yz = 0 


Landé’s formula 


178 VII. PERTURBATION THEORY 


and that bere = Vys ee %y| = —~YVa5 esas! = 0. 


These relations are of the same form as the relations (3), (4), and (5) of 
Chapter VII, so that our 7, yy, Yz are connected with the angular momentum j 
in the same way in which the x, y, z of Chapter VII were connected with 
the angular momentum m. We can thus take over the analysis of 842, in which 
the condition was obtained for the non-vanishing of a matrix element of x, y, 
or Zz in a representation in which 7 is diagonal. We find in this way that 
the only non-vanishing matrix elements of yz, yy and yz in a representation in 
which 7 is diagonal are those referring to transitions in which 7 changes by +h. 
The coefficients of yz, yy in the second term on the right-hand side of (34) commute 
with 7, so that the representative of the whole of this term will contain only matrix 
elements referring to transitions in which 7 changes by +h, and thus referring to 
two different energy-levels of the unperturbed system. 

Hence the perturbing energy V becomes, when we neglect that part of it whose 
representative consists of matrix elements referring to two different unperturbed 
energy-levels, cH. f,, HI +H —Ul+h) + s(s +h) 

aie 27 +h) \ 
The eigenvalues of this give the first-order changes in the energy-levels. We can 
make the representative of this expression diagonal by choosing our representation 
such that 7, is diagonal, i.e by taking the basic states to be spatially quantized in 
the z-direction. The expression (36) then gives us directly the first-order changes 
in the energy-levels caused by the magnetic field. This expression is known as 
Landé’s formula! 

The result (36) holds only provided the perturbing energy V is small 
compared with the energy differences within a multiplet. For larger values 
of V a more complicated theory is required. For very strong fields, however, 
for which V is large compared with the energy differences within a multiplet, 
the theory is again very simple. We may now neglect altogether the energy 
of the spin magnetic moments for the atom with no external field, so that 
for our unperturbed system the vectors 1 and s themselves are constants 
of the motion, and not merely their magnitudes / and s. Our _ perturbing 
energy V, which is still (e#/2mc)(j. + sz), is now a constant of the motion 
for the unperturbed system, so that its eigenvalues give directly the changes in 
the energy-levels. These eigenvalues are integral or half-odd integral multiples 
of e#’h/2mc according to whether the number of electrons in the atom is even 
or odd. 


(36) 


‘!This is a version of Alfred Landé’s g-formula. Alfred Landé (1951) Quantum mechanics 
Pitman; p. 208f Alfred Landé ,,Termstruktur und Zeemaneffekt der Multipletts* Zeitschrift fiir 
Physik (1923) 15 pp. 189-205 doi: 10.1007/BF01330473 | 


IX. COLLISION PROBLEMS 


51. General remarks 


IN this chapter we shall investigate problems connected with a particle which, 
coming from infinity, encounters or ‘collides with’ some atomic system and, 
after being scattered through a certain angle, goes off to infinity again. The atomic 
system which does the scattering we shall call, for brevity, the scatterer. We thus 
have a dynamical system composed of an incident particle and a scatterer 
interacting with each other, which we must deal with according to the laws of 
quantum mechanics, and for which we must, in particular, calculate the probability 
of scattering through any given angle. The problem was first solved by Max Born 
by a method substantially equivalent to that of the next section. We must take 
into account the possibility that the scatterer, considered as a system by itself, 
may have a number of different stationary states and that if it is initially in one 
of these states when the particle arrives from infinity, it may be left in a different 
one when the particle goes off to infinity again. The colliding particle may thus 
induce transitions in the scatterer. 

The Hamiltonian for the whole system of scatterer plus particle will not involve 
the time explicitly, so that this whole system will have stationary states represented 
by simply-periodic solutions of Schrddinger’s wave equation. The meaning of 
these stationary states requires a little care to be properly understood. It is 
evident that for any state of motion of the system the particle will spend 
nearly all its time at infinity, so that the time average of the probability of 
the particle being in any finite volume will be zero. Now for a stationary state 
the probability of the particle being in a given finite volume, like any other result 
of observation, must be independent of the time, and hence this probability will 
equal its time average, which we have seen is zero. We shall thus be interested 
only in the relative probabilities of the particle being in different finite volumes, 
their absolute values being all zero. Mathematically we have that if the w 
representing a stationary state is normalized correctly fon physical interpretation, 
ie. such that gw = 1, and if we let Q denote that observable (which is a certain 
function of the position of the particle) that is equal to unity if the particle is in 
a given finite volume and zero otherwise, then ¢QwW = 0, meaning that the average 


179 


scatterer 


180 IX. COLLISION PROBLEMS 


value of Q, i.e. the probability of the particle being in the given volume, is zero. 
It would therefore be more convenient for us to denote the stationary state by a w 
normalized to infinity, i.e. for which @w = oo, the infinity being such as to make 
oQw finite. This finite (QW would then give the relative probability of the particle 
being in the given volume. 

In picturing a state of a system represented by a w which is not normalized 
correctly for physical interpretation, but for which ¢~ = n say, it may be 
convenient to suppose that we have n similar systems all occupying the same 
space but with no interaction between them, so that each one follows out its own 
motion independently of the others, as we had in the theory of the Gibbs ensemble 
in §37. We can then interpret daw, where a is any observable, directly as 
the total a for all the n systems. In applying these ideas to the above-mentioned w 
normalized to infinity, representing a stationary state of the system of scatterer 
plus colliding particle, we should picture an infinite number of such systems 
with the scatterers all located at the same point and the particles distributed 
continuously throughout space. The number of particles in a given finite volume 
would be pictured as ¢Qw, Q being the observable defined above, which has 
the value unity when the particle is in the given volume and zero otherwise. 
If the w is represented by a Schrédinger wave function involving the Cartesian 
coordinates of the particle, then the square of the modulus of the wave function 
could be interpreted directly as the density of particles in the picture. One must 
remember, however, that each of these particles has its own individual scatterer. 
Different particles may belong to scatterers in different states. There will thus be 
one particle density for each state of the scatterer, namely, the density of those 
particles belonging to scatterers in that state. This is taken account of by the wave 
function involving variables describing the state of the scatterer in addition to those 
describing the position of the particle. 

For determining scattering coefficients we have to investigate stationary states 
of the whole system of scatterer plus particle. For instance, if we want to determine 
the probability of scattering in various directions when the scatterer is initially in 
a given stationary state and the incident particle has initially a given velocity 
in a given direction, we must investigate that stationary state of the whole 
system whose picture, according to the above method, contains at great distances 
from the point of location of the scatterers only particles moving with the given 
initial velocity and direction and belonging each to a scatterer in the given 
initial stationary state, together with particles moving outward from the point 
of location of the scatterers and belonging possibly to scatterers in various 
stationary states. This picture corresponds closely to the actual state of affairs 
in an experimental determination of scattering coefficients, with the difference 
that the picture really describes only one actual system of scatterer plus particle. 


51. General remarks 181 


The distribution of outward moving particles at infinity in the picture gives 
us immediately all the information about scattering coefficients that could be 
obtained by experiment. For practical calculations about the stationary state 
described by this picture one may use the perturbation method of §46, taking as 
unperturbed system, for example, that for which there is no interaction between 
the scatterer and particle. 

In dealing with collision problems, a further possibility to be taken into 
consideration is that the scatterer may perhaps be capable of absorbing and 
re-emitting the particle. This possibility arises when there exists one or more states 
of absorption of the whole system, a state of absorption being an approximately state of absorption 
stationary state which is closed in the sense of §40 (i.e. for which the probability 
of the particle being at a greater distance than r from the scatterer tends to 
zero as r —> oo). Since a state of absorption is only approximately stationary, 
its property of being closed will be only a transient one, and after a sufficient 
lapse of time there will be a finite probability of the particle being on its way 
to infinity. Physically this means there is a finite probability of spontaneous 
emission of the particle. The fact that we had to use the word ‘approximately’ in 
stating the conditions required for the phenomena of emission and absorption 
to be able to occur shows that these conditions are not expressible in exact 
mathematical language. One can give a meaning to these phenomena only with 
reference to a perturbation method. They occur when the unperturbed system 
(of scatterer plus particle) has stationary states that are closed. The introduction 
of the perturbation now spoils the stationary property of these states and gives 
rise to spontaneous emission and its converse absorption. 

For calculating absorption and emission probabilities it is necessary to deal 
with non-stationary states of the system, in contradistinction to the case for 
scattering coefficients, so that the perturbation method of 847 must be used. 
Thus for calculating an emission coefficient we must consider the non-stationary 
states of absorption described above. Again, since an absorption is always followed 
by a re-emission, it cannot be distinguished from a scattering in any experiment 
involving a steady state of affairs, corresponding to a stationary state of the system. 
The distinction can be made only by reference to a non-steady state of affairs, 
e.g. by use of a stream of incident particles that has an abrupt' beginning, so that 
the scattered particles will appear immediately after the incident particles meet 
the scatterers, while those that have been absorbed and re-emitted will begin 
to appear only some time later. This stream of particles would then be the picture 
of a certain non-stationary w, normalized to infinity, which could be used for 
calculating the absorption coefficient. 


‘{‘abrupt’ substitutes for ‘sharp’.] 


182 IX. COLLISION PROBLEMS 


52. The Scattering Coefficient 


We shall now consider the calculation of scattering coefficients, taking first the case 
when there is no absorption and emission, which means that our unperturbed 
system has no closed stationary states. We may conveniently take this unperturbed 
system to be that for which there is no interaction between the scatterer and 
particle. Its Hamiltonian will thus be of the form 


Hy = H,+W, (1) 


where H, is that for the scatterer alone and W that for the particle alone, namely, 
with neglect of relativistic variation of mass with velocity, 


W = (1/2m)(p2, + pi, + p2). (2) 


The perturbing energy V, assumed small, will now be a function of the Cartesian 
coordinates of the particle z, y, z, and also, perhaps, of its momenta pz, py, Pz; 
together with dynamical variables describing the scatterer. 

Since we are now interested only in stationary states of the whole system, 
we can use the perturbation method of 846. Our unperturbed system now 
necessarily has a continuous range of energy-levels, since it contains a free 
particle, and this gives rise to certain modifications in the perturbation method. 
The question of the change in the energy-levels caused by the perturbation, 
which was the main question of 846, no longer has a meaning, and the convention 
in §46 of using the same number of primes to denote nearly equal eigenvalues of 
Hy and H now! becomes redundant. Again, the splitting of energy-levels which 
we had in §46 when the unperturbed system is degenerate cannot now arise, 
since if the unperturbed system is degenerate the perturbed one, which must 
also have a continuous range of energy-levels, will also be degenerate to exactly 
the same extent. 

We again use the general scheme of equations developed at the beginning of 
§46, equations (1) to (4) there, but we now take our unperturbed stationary state 
forming the zero-order approximation to belong to an energy-level H} just equal to 
the energy-level H’ of our perturbed stationary state. We put Hj = H' = E say. 
Thus the a’s introduced in the second of equations (3) §46 are now all zero and 
the second of equations (4) there now reads 


{E — Ho} =V v0. (3) 
Similarly, the third of equations (4) §46 now reads 
{E — Hof. = Vy. (4) 


‘Original ‘drops out.’] 


52. The Scattering Coefficient 183 


We shall proceed to solve equation (3) and to obtain the scattering coefficient to 
the first order. We shall need equation (4) later. 

Let a denote a complete set of observables describing the scatterer, which are 
constants of the motion when the scatterer is alone and may thus be used for 
labelling the stationary states of the scatterer. This requires that H, shall commute 
with the a’s and be a function of them. We can now take a representation of 
the whole system in which the a’s and x, y, z the coordinates of the particle, 
are diagonal. This will make H, diagonal. Let Wo be represented by (xa|0) and 
wv by (xa|1), the single variable x being written in the wave function to denote 
x, y, z. In the same way the single differential dx will be written to denote 
the product dxdydz. Equation (3), written in terms of representatives, becomes, 
with the help of (1) and (2), 


{E — H,(a’) + (h?/2m)V7}(xal1) = S- [cxo'ivix'a” dx” (x"a""|0). (5) 


Suppose that the incident particle has the momentum p® and that the initial 
stationary state of the scatterer is a°. The stationary state Wo of our unperturbed 
system is now the one for which p = p® and a = a, and hence its representative 
is of the form 

(xa|0) = daqoei(P?*)/P. (6) 


This makes equation (5) reduce to 


{E — H,(a') + (h?/2m)V*}(xal1) = [xa'ivix’a®) Ax? eilp?,*°)/h 


or {k? + V7}(xo'|1) = F, (7) 
where k? =Imh 7 {LE — H,(a’)} (8) 
and F=2mh? [cso'ivix°a”) dx® cPo)/F (9) 


a definite function of x, y, z and a’. We must also have 
E = Hy = H,(a°) + p? /2m. (10) 


Our problem now is to obtain a solution (xa’|1) of (7) which, for values of x, y, z 
denoting points far from the scatterer, represents only outward moving particles. 
The square of its modulus, |(xa’|1)|*, will then give the density of scattered 
particles belonging to scatterers in the state a’ when the density of the incident 
particles is |(xa’|0)|’, which is unity. If we transform to polar coordinates r, 0, ¢, 
equation (7) becomes 


bis or TOFS 2 SISO) 1 oO a) 1 oO” 
k | | 0 | = 
r2 sin’ 0 0¢? 


on For peeme oe 88 \ (eb6alt) = F oy) 


184 IX. COLLISION PROBLEMS 


Now F' must tend to zero as r — oo, on account of the physical requirement that 
the interaction energy between the scatterer and particle must tend to zero as 
the distance between them tends to infinity. If we neglect F in (11) altogether, 
an approximate solution for large r is 


(rAga'|1) = u(9, ¢, a’)rte™, (12) 


where wu is an arbitrary function of 6, ¢ and a, since this expression substituted 
in the left-hand side of (11) gives a result of order r~* When we do not neglect 
F,, the solution of (11) will still be of the form (12) for large r, provided F’ tends 
to zero sufficiently rapidly as r > oo, but the function wu will now be definite and 
determined by the solution for smaller values of r. 

For values a’ of the a’s such that k*, defined by (8), is positive, the k in (12) 
must be chosen to be the positive square root of k?, in order that (12) may represent 
only outward moving particles, i.e. particles for which the radial component 
of momentum p,, represented by —ihO/Or when it operates to the right, 
has a positive value. We now have that the density of scattered particles belonging 
to scatterers in state a’, equal to the square of the modulus of (12), falls off 
with increasing r according to the inverse square law, as is physically necessary, 
and their angular distribution is given by |u(@¢a’)|”. Further, the magnitude, 
P' say, of the momentum of these scattered particles must equal kh, to make 


the exponential in (12) of the form e“”’"/", so that their energy is equal to 


Pp? kA 02 
= EE _ p~ Hal) = Ho) ~ Hal) + 
m 


2m 2m 
with the help of (8) and (10). This is just the energy of an incident particle, namely 
p°* /2m, reduced by the increase in energy of the scatterer, namely H,(a’)—H,(a°), 
in agreement with the law of conservation of energy. For values a’ of the a’s such 
that k? is negative there are no scattered particles, the total initial energy being 
insufficient for the scatterer to be left in the state a’. 

We must now evaluate u(@, ¢, a’) for a set. of values a’ for the a’s such that k? 
is positive, and obtain the angular distribution of the scattered particles belonging 
to scatterers in state a’. It is sufficient to evaluate u for the direction 6 = 0 of 
the pole of the polar coordinates, since this direction is arbitrary. We make use 
of Green’s theorem, which states that for any two functions of position A and B 
the volume integral [(A V?B—B V?A) dx taken over any volume equals the surface 
integral [(AOB/On— BOA/On) dS taken over the boundary of the volume, 0/0n 
denoting differentiation along the normal to the surface. We take 


A= e7 tkr cos é B= (r0ba’'|1) 


52. The Scattering Coefficient 185 


and apply the theorem to a large sphere with the origin as centre. The volume 
integrand is thus 


er 08 °N72 (7 Oba’ |1) _ (rOba’|1)V2e7**" 8? = Ge Oe a k?) (r@ga'|1) 
= e7 tkr cos @ mp 


from (7) or (11), while the surface integrand is, with the help of (12), 
O Oo: 
erent (rOga"|1) _ (rAda'|1) eres” 
1 Ik\ 
= e tkr cos 6), (-= a =] eikr ate so er pais Qe tkr cos 8 
r r r 
= ikur—'(1 + cos pyar ce®) 


with neglect. of r~?. Hence we get 


20 T 
joey dx = | dd F, r’ sin 6 d0ikur—*(1 + cos O)e*r-eos 9) 
0 0 


Qn 2 
=ikr fae f dyu(6,d,0')(2- ye”, 
0 0 


where y = 1 — cos@, the volume integral on the left being taken over the whole 
of space. The right-hand side becomes, on being integrated by parts with respect 
to ¥, 


[#0 {oe eeve—nerrycs— [arets tue.e.ah2— al}. 


The second term in the {} brackets is of the order of magnitude of r~', as would be 
revealed by further partial integrations, and may therefore be neglected. We are 
thus left with 


27 
en dx = -2 f dou(0, , a’) = —4Aru(0, ¢, a’), 
0 


giving the value of u(0, ¢, a’) for the direction 0 = 0. 
This result may be written 


u(0, ¢, a’) = —(4r)7* peteeiinas, (13) 


since P’ = kh. If the vector p’ denotes the momentum of the scattered electrons 
coming off in a certain direction (and is thus of magnitude P’), the value of u for 
this direction will be 


u(6,. 9,0") = —(4n) / cP ME dx, 


186 IX. COLLISION PROBLEMS 


as follows from (13) if one takes this direction to be the pole of the polar 
coordinates. This becomes, with the help of (9), 


u(O, ba’) = —(20)"'mh”? / e (PS x)/F dx (xa’|V|x°a°) dx° eile’) /h 
= —27rmh(p'a’|V |p°a®), (14) 


when one makes a transformation from the coordinates x to the momenta p of 
the particle, using the transformation function (43) of Chapter V. The single 
letter p is here used to denote the three components of momentum. 

The density of scattered particles belonging to scatterers in state a’ is now 
given by |u(6,¢,a’)|’ /r2. Since their velocity is P’/m, the rate at which 
these particles appear per unit solid angle about the direction of the vector p’ will 
be (P’/m) |u(6,, ¢,a’)|°. The density of the incident particles is, as we have seen, 
unity, so that the number of incident particles crossing unit area per unit time is 
equal to their velocity P°/m, where P® is the magnitude of p°. Hence the effective 
area that must be hit by an incident particle in order to be scattered in a unit 
solid angle about the direction p’ and then belong to a scatterer in state a’ will be 

(P'/P°) |u(6", 6, a!)|? = (4n?m?h? P’/P®) |(pla’|V[p°a”)|’. (15) 
This is the scattering coefficient for transitions a® — a’ of the scatterer. 
It depends on that matrix element (p’a’|V|p°a°) of the perturbing energy V 
whose column p°a° and whose row p’a’ refer respectively to the initial and final 
states of the unperturbed system, between which the scattering transition process 
takes place. The result (15) is thus in some ways analogous to the result (20) or (21) 
of Chapter VIII, although the numerical coefficients are different in the two cases, 
corresponding to the different natures of the two transition processes. 


53. Solution with the p-Representation 


The result (15) for the scattering coefficient makes a reference only to that 
representation in which the momentum p is diagonal. One would thus expect 
to be able to get a more direct proof of the result by working all the time in 
the p-representation, instead of working in the x-representation and transforming 
at the end to the p-representation, as was done in §52. This would not at first sight 
appear to be a great improvement, as the lack of directness of the x-representation 
method is offset by its greater ‘Anschaulichkeit’' it being possible to picture 
the square of the modulus of the x-representative of a state as the density 
of a stream of particles in process of being scattered. The x-representation 


‘[Clarity.] 


53. Solution with the p-Representation 187 


method has, however, other more serious disadvantages. One of the main 
applications of the theory of collisions is to the case of photons as incident 
particles. Now a photon is not a simple particle but has a polarization. It is 
evident from classical electromagnetic theory that a photon with a definite 
momentum, i.e. one moving in a definite direction with a definite frequency, 
may have a definite state of polarization (linear, circular, etc.), while a photon 
with a definite position, which is to be pictured as an electromagnetic disturbance 
confined to a very small volume, cannot have any definite polarization. These facts 
mean that the polarization observable of a photon commutes with its momentum 
but not with its position. This results in the p-representation method being 
immediately applicable to the case of photons, it being only necessary to 
introduce the polarizing variable into the representatives and treat it along 
with the a’s describing the scatterer, while the x-representation method is not 
applicable. Further, in dealing with photons, it is necessary to take the relativistic 
variation of mass with velocity into account. This can easily be done in 
the p-representation method, but not so easily in the x-representation method. 

Equation (3) still holds when the relativistic variation of mass with velocity is 
taken into account for the particle, but W is now given by 


WHC Sie PS ee, (16) 


instead of by (2). Written in terms of p-representatives, equation (3) becomes 
{B= Hla!) - W}(pe') = > | (pa'|Vip"a") ap" (p"a"0), 


W being here understood as a definite function of p,, py, pz given by (16). This may 
be written 


{W' — W}(pa'|1) = > [eo'ivip'e" dp" (p"a"|0), (17) 
where W' = E — H,(a’) (18) 


and is the energy required by the law of conservation of energy for a scattered 
particle belonging to a scatterer in state a’. The p-representative of wo, obtained by 
transforming (6) with the transformation function (43) of Chapter V, is 


(pa|0)h?5a005(p — p°), (19) 


as may be verified most easily by transforming this back to the x-representation. 
The 6(p — p’) means the product 


5 (Dx — p2)5(py — py) (pz — pe). 


188 IX. COLLISION PROBLEMS 


Equation (17) now becomes 
{W' — W}(pa'|1) = h? (pa’|V|p°a”). (20) 


We now make a canonical transformation from the Cartesian co-ordinates p,, 
Py, Pz of p to its polar coordinates P, w, x, given by 


Dy = Pcosw, Py = Psinw cos x, p, = Psinwsiny. 


If in the new representation we take the weight function P? sin w, then the weight 
attached to any volume of p-space will be the same as in the previous 
p-representation, so that the canonical transformation will mean simply 
a relabelling of the rows and columns of the matrices without any alteration of 
the matrix elements or of the coordinates representing a state. Thus (20) will 
become in the new representation 


3 
2 


{W' —W}(Pwxa'|1) = h?(Pwxa'|V| Pew ya"), (21) 
W being now a function of the single variable P. 

The coefficient of (Pwya'|1), namely {W’ — W}, is now simply a multiplying 
factor and not a differential operator as it was with the x-representation method. 
We can therefore divide out by this factor and obtain an explicit expression 
for (Pwya'|1). When, however, a’ is such that W’, defined by (18), is greater 
than mc?, this factor will have the value zero for a certain point in the domain 
of the variable P, namely the point P = P’, given in terms of W’ by (16). 
The function (Pwya’|1) will then have a singularity at this point. This singularity 
shows that (Pwyxa'|1) represents an infinite number of particles moving about at 
great distances from the scatterers with energies indefinitely close to W’ and it is 
therefore this singularity that we have to study to get the angular distribution of 
the particles at infinity. 

The result of dividing out (21) by the factor {W’ — W} is 


3 
2 


(Pwxa’|1) = h?(Pwxa'|V|P ow x°a") /{W'! —W}4 A(w, x, 0')5(W’ — W), (22) 
where is an arbitrary function of w, y and a’, since when an arbitrary multiple of 
6(W' —W) is multiplied by {W’—W} the product will vanish. To give a meaning 
to the first term on the right-hand side of (22), we make the convention that 
its integral with respect to P over a range that includes the value P’ is the limit 
when € — 0 of the integral when the small domain P’—« to P’+€e is excluded from 
the range of integration. This is sufficient to make the meaning of (22) precise, 
since we are interested effectively only in the integrals of the representatives 
of states when the representation has continuous ranges of rows and columns. 


53. Solution with the p-Representation 189 


We see that equation (21) is inadequate to determine the representative (Pwxa‘|1) 
completely, on account of the arbitrary function occurring in (22). We must 
choose this A such that (Pwya’|1) represents only outward moving particles, 
since we want the only inward moving particles to be those represented by (19). 

Let us take first the general case when the representative (Pwx|) of a state of 
the particle satisfies an equation of the type 


{W! — W}(Pwx]) = f(P,w,x), (23) 


where f(P,w,y) is any function of P, w and y, and W’ is a number greater 
than mc?, so that (Pwy|)is of the form 


(Pwx|) = f(P.w, x)/(W! — W) + A(w, x)6(W" - W), (24) 


and let us determine now what \ must be in order that (Pwx|) may represent 
only outward moving particles. We can do this by transforming (Pwy]) 
to the x-representation, or rather the (r0¢)-representation, and comparing it 
with (12) for large values of r. The transformation function is 


(r0¢|Pwy) = haa eilP:*)/h — has eiPricosw sin 6-+sin w sin 8 cos(x—4)]/h. 
For the direction 6 = 0 we find 


P oo 20 wT ; 
(r0¢|) = h-2 | P? dP | dx i sin w dw e'Pros4/F( Puyy|) 
0 0 0 


yn oo Pp i Qn P eiPr cosw/h > beets 
we free [acta Po 


T F eiPrcosw/h fa) é 
+ fee as PD} 


The second term in the {} brackets is of order r~?, as may be verified by further 


partial integrations with respect to w, and can therefore be neglected. We are 
left with 


lee) Qn 
(r0¢|) = in }(2mr)* | PdP " dx fe-tPr/h( Pry) — ePr/h( POy|)} 
0 0 
= ante f PdP{e?r/*( Pry|) — ePt/* (POy|)}. (25) 
0 


When we substitute for (Pwy|) its value given by (24), the first term in 
the integrand in (25) gives 


ante’ [pape PIM f(Pya,x)/(W! —W) + AC, x)5(W" — WY} (26) 


190 IX. COLLISION PROBLEMS 


The term involving 6(W’ — W) here may be integrated immediately and gives, 
when one uses the relation Pd P = WdW/c?, which follows from (16), 


i i: W dW eX, x)6(W! = W) = ther WAR, xe” 


me? 


(27) 
To integrate the other term in (26) we use the formula that 
fore) e tPr/h oe) eiPr/h 

P)——— dP =q(P’ ——— (| P 28 

[ smp—per - oP) [ Sper. (28) 

with neglect of terms involving r~', for any continuous function g(P), 


which formula holds since f>° K(P)e?"/" dP. is of order r~! for any continuous 
function K(P) and since the difference 


GP) ay ag Piet) 


is continuous. The right-hand side of (28), when evaluated with neglect of terms 
involving r~}, and also with neglect of the small domain P’—« to P’+e in the domain 
of integration, gives 


oo 4-iPr/h so co .i(P’—P)r/h 
oP) | e€ dP = g(PyePor f er; - 


Pl a-P bce ieee 
5 © sin{(P’ — P 
= ig(P’)eP rfh i alee - is dP 
= ing(P’\e?" /h. (29) 


In our present example g(P) is 
g(P) = ihr Pf (P, x, x)(P! — P)/(W' -W), 
which has the limiting value when P = P’, 
g(P’) = thr Pf (Pt, QW / Pl? = ihc 2r|W’ f (P’, 7, x). 


Substituting this in (29) and adding on the expression (27), we obtain the following 
value for the integral (26) 


phe *rW"{ 0 f (Pl, x) + iM ar, x)}emP (30) 
Similarly the second term in the integrand in (25) gives 
her W'{—f (P’,0, x) — iA(0, x) fe" (31) 


The sum of these two expressions is the value of (r0¢|) when r is large. 


53. Solution with the p-Representation 191 


We require that (r0¢|) shall represent only outward moving particles, and hence 
it must. be of the form of a multiple of e’”’"/”, Thus (30) must vanish, so that 


Arr, x) = —in f(P% 7, x) (32) 


We see in this way that the condition that (r@¢@|) shall represent only outward 
moving particles in the direction 6 = 0 fixes the value of \ for the opposite direction 
? = 7. Since the direction 6 = 0 or w = 0 of the pole of our polar coordinates is 
not in any way singular, we can generalize (32) to 


A(w, x) a —in f(Piw, 0s (33) 


which gives the value of \ for an arbitrary direction. This value substituted in (24) 
gives a result that may be written 


(Pw) = f(P,w, x){1/(W! — W) — ind(W! — W)}, (34) 


since one can substitute P’ for P in the coefficient of a term involving 6(W’—W) as 
a factor without changing the value of the term. The condition that (Pwy]|) shall 
represent only outward moving particles is thus that it shall contain the factor 


{1/(W' — W) — ind(W' — W)}. (35) 


It is interesting to note that this factor is of the form of the right-hand side of 
equation (17) of Chapter IV. 

With \ given by (33), expression (30) vanishes and the value of (r0¢|) for 
large r is given by expression (31) alone, thus 


(r0¢|) = —2rh-2c-?2r!W’ f (P10, y)e?7/” 
This may be generalized to 
(rO¢|) = —2rh-2c-7r!W’ f (Pl w, ye?" 


giving the value of (r@@|) for any direction 0, ¢ in terms of f(P’,w, x) for the same 
direction labelled by w, y. This is of the form (12) with 


u(O,@) = —2th-4e?W’ f(P’,w, x) 


and thus represents a distribution of outward moving particles of momentum P’ 
whose number is 2p WP? 
Cc 2 Tv 2 

W Jul" = 2 |f(Piw,x)| (36) 
per unit solid angle per unit time. This distribution is the one represented by 
the (Pwx|) of (34). 


192 IX. COLLISION PROBLEMS 


From this general result we can infer that, whenever we have a representative 
(Pwx|) representing only outward moving particles and satisfying an equation of 
the type (23), the number per unit solid angle per unit time of these particles is 
given by (36). If this (Pw x|) occurs in a problem in which the number of incident 
particles is one per unit volume, it will correspond to a scattering coefficient 
of amount apne 

apr |f(P’wx))? (37) 
It is only the value of the function f(P,w,y) for the point P = P’ that is 
of importance. 

If we now apply this general theory to our equations (21) and (22), we have 


f(P, wx) = 3 (Puxal|V|P%w?x?a), 
Hence from (37) the scattering coefficient is 
(4n*h?W°W'P'/c8P*) |(Pwxa’|V|Powx°a”)| : (38) 
2 


If one neglects relativity and puts W°W’/ct = m2 this result reduces to 
the result (15) obtained in the preceding section by means of Green’s theorem. 


54. Dispersive Scattering 


We shall now determine the scattering when the incident particle is capable of 
being absorbed, that is, when our unperturbed system of scatterer plus particle has 
closed stationary states with the particle absorbed. The existence of these closed 
states for the unperturbed system will be found to have a considerable effect. on 
the scattering for the perturbed system, and indeed an effect that depends very 
much on the energy of the incident particle, giving rise to the phenomenon of 
dispersion in optics when the incident particle is taken to be a photon. 

We use a representation for which the basic states are the stationary 
states of the unperturbed system, as was the case for the p-representation of 
the preceding section. We take these stationary states to be the states w(p‘, a’) 
for which the particle has a definite momentum p’ and the scatterer is in 
a definite state a’, together with the closed states, ¢, say, which form a separate 
discrete set, and assume that these states are all independent and orthogonal. 
This assumption is probably not justifiable when the particle is an electron or 
atomic nucleus, since in this case for an absorbed state vw, the particle will still 
certainly be somewhere, so that one would expect to be able to expand yy, in 
terms of the eigen-w’s w(x’, a’) of x, y, z, and the a’s, and hence also in terms 
of the v(p'a’). On the other hand, when the particle is a photon it will 
no longer exist for the absorbed states, which are then certainly independent 


54. Dispersive Scattering 193 


of and orthogonal to the states v(p’ a’) for which the particle does exist. 
Thus the assumption is justified in this case, which is the important practical one. 

The representative of a state will now consist of a discrete set of numbers (k]) 
referring to the basic states uv, together with the three-dimensional continuous 
ranges of numbers (p’a’|) referring to the w(p‘,a’), there being on one such 
range for each set of values a’ for the a’s. Similarly, the matrices representing 
dynamical variables will now contain discrete rows and columns labelled by k 
together with continuous ranges labelled by (p,a). Thus, for example, the matrix 
representing V, the perturbing energy, will have elements (k’|V|k”), (k’|V|p’a”), 
(p'a’|V|k"), and (p’a’|V |[p"a"). 

Since we are concerned with scattering, we must still deal with stationary states 
of the whole system. We shall now, however, have to work to the second order 
of accuracy, so that we cannot use merely the first-order equation (3), but must 
use also (4). Equation (3) becomes, when written in terms of representatives in 
our present representation, 


{W'—W}(pe'|1) => ic (pa'|V[p"a") dp" (p"a"|0) +5 -(pa'[V/|k")(k"|0) 
all ki’! 


{B= By}(e|1) =D ft (k|V[p"a") dp" (p"a"|[0) + S7 (IV |e")(K0). 


al kl! 


(39) 


where W’ is the function of E and the a’’s given by (18) and EF, is the energy of 
the stationary state w, of the unperturbed system. Similarly, equation (4) becomes 


{W!—W}(pa'}2)=57 ic (pa'|V pa") dp" (p"a"|1) +“ (pa'[V |") (k"|) 


all kl’ 


AO) 
{EB — Ex}(kl2) = » fk (k|V|p"a") dp" (p"a!"|1) + D (RIVA) (A"|1). o 
7 

The unperturbed stationary state Wo will now be represented by 

(pa|0) = h?5.095(p —p°), (KO) = 0, (41) 
instead of merely by (19), so (39) reduces to 

{W' —W}(pa'|1) = h? (pa'|V[p°a"), (42) 
{E — Ex}(k|1) = h#(k|V[p°a°). (43) 


We may assume that the matrix elements (k’|V|k”) of V_ vanish, 
since these matrix elements are not essential to the phenomena under investigation, 
and if they did not vanish it would mean simply that the absorbed states 


194 IX. COLLISION PROBLEMS 


Ww, had not been suitably chosen. We shall further assume that the matrix 
elements (p’a’|V|p”a”) are of the second order of smallness when the matrix 
elements (k’|V|p’a”), (p’a’|V|k") are taken to be of the first order of smallness. 
This assumption will be justified for the case of photons in Chapter XI. We now 
have from (43) and (42) that (k|1) is of the first order of smallness, provided 
E does not lie near one of the discrete set of energy-levels F;,, and (pall) is of 
the second order. The value of (pa|2) to the second order will thus be given, 
from the first of equations (40), by 


{W' — W}(pa!|2)=h? S°(pa'|V|k")(k"|\V[p°a”) /{B — By}. 


ki’ 


The total correction to the second order, arising partly from (pa|1) and partly 
from (pa|2), therefore satisfies 


{WW }{(pa’|1)+(pa!|2)}=h? {(pa'|V|p°a°) +) (pa’|V|k)(k|V [p°a”) /(B—Ex)}. 


This equation is of the type (23), provided a’ is such that W’ > mc?, which means 
that a’ as a final state for the scatterer is not inconsistent with the law of 
conservation of energy. We can therefore infer from the general result (37) that 
the scattering coefficient is 


2 
4n2h2WW’' P’ foe: p'a’|V|k)(k|V|p°a”) 


ime] 0 
ct P9 (p me |VIpa AS Ex ’ (44) 


k 


The scattering may now be considered as composed of two parts, a part that 
arises from the matrix element (p’a’|V|p°a°) of the perturbing energy a part that 
arises from the matrix elements (p’a’|V|k) and (k|V|p°a°). The first part, which is 
the same as our previously obtained result (38), may be called the true scattering. 
The second part may be considered as arising from an absorption of the incident 
particle into some state k, followed immediately by a re-emission in a different 
direction. The fact that we have to add the two terms before taking the square of 
the modulus denotes interference between the two kinds of scattering. There is no 
experimental way of separating the two kinds, the distinction between them being 
only mathematical. 


55. Resonance Scattering 


Suppose the energy of the incident particle to be varied continuously while 
the initial state a° of the scatterer is kept fixed, so that the total energy E varies 
continuously. The formula (44) now shows that as E approaches one of the discrete 


55. Resonance Scattering 195 


set of energy-levels F;,, the scattering becomes very large. In fact, according to 
formula (44) the scattering should be infinite when E is exactly equal to an Ex. 
An infinite scattering coefficient is, of course, physically impossible, so that we 
can infer that the approximations used in deriving (44) are no longer legitimate 
when F is close to an Ey. To investigate the scattering in this case we must 
therefore go back to the exact equation 


{E = Ho}w = Vy, 


which is the same as (2) of Chapter VIII, and use a different method of 
approximating to its solution. This exact equation, written in terms of 
representatives, becomes 


{W! —W}(pa'|) = i: (pa'|V[p"a") dp" (p"a"|) + (pa! [V |") (de"), 


" me (45) 
{E — Ex}(kl) = >> fev’) dp” (p"a"|) + $°(k|V|k")(A"). 


Let us take one particular E;, and consider the case when E is close to it. 
The large term in the scattering coefficient (44) now arises from those elements 
of the matrix representing V that le in row & or in column k, ie. those of 
the type (k|V|pa) or (pa|V|k). The scattering arising from the other matrix 
elements of V is of a smaller order of magnitude. This suggests that in our exact 
equations (45) we should make the approximation of neglecting all the matrix 
elements of V except the important ones, which are those of the type (pa’|V|k) 
or (k|V|pa’), where a’ is a state of the scatterer that has not too much energy to be 
disallowed as a final state by the law of conservation of energy. These equations 
then reduce to 


{W' — W}(pa'|) = (pa'|V|k) (KI), 


{EB — Ex}(kl) = > [vive dp (pa’}), (46) 


the a summation being over those values of a’ for which W’ given by (18) is > mc? 
These equations are now sufficiently, simple for us to be able to solve exactly 
without further approximation. 

From the first of equations (46) we obtain by division 


(pa’|) = (pa’|V|k) (KI) /{W! — W} + 6(W" — W). (47) 


We must choose \, which may be any function of the momentum p and a’, 
such that (47) represents the incident particles (19) together with only outward 


196 IX. COLLISION PROBLEMS 


moving particles. [The right-hand side of (19), with a’ substituted for a, is actually 
of the form A6(W’—W), since the conditions a’ = a° and p = p’ for this right-hand 
side not to vanish lead to W’ = E — H,(a’) = E — H,(a®) = W® and W = W® 
which together give W’ = W.] Thus (47) must be 


(pa'|) = h25q/q05(p — p°) + (pa’|V|k)(kl){1/(W! — W) — ind(W'— W)}, (48) 
and from the general formula (37) the scattering coefficient will be 
(4n° W°W'P'/he*P®) |(p'a’|V|k)| |(&I) (49) 


It remains for us to determine the value of (k|). We can do this by substituting 
for (pa’|) in the second of equations (46) its value given by (48). This gives 


{E— Ex}(k]) =n3(IV Ip") +R) flav pe’) P/V W) —in5(W—W)} dp, 


= h2(k|V|p°a®) + (k|) {a — ib}, 
where >> i (IV po’)? dp /(W' — W) (50) 


a’ 


and b= [ (Alipay 50" - W) dp 
= > /// \(k|V| Pwxa’)|? 5(W! — W)P? dP sinw dw dx 


= t\) PW'e / \(k|V|P’wya’)|? sinw dw dy. (51) 


Thus (k|) = h2(k|V|p°a®) /{E — E, — a + id}. (52) 
Note that a and 6 are real and that b is positive. 
This value for (k|) substituted in (49) gives for the scattering coefficient 
4n?h?W°W'P" |(p'a’|V|k)|” (&IV poo?) |” 
ci Po (E—-E,—-a)?+2 


(53) 


One can obtain the total effective area that the incident particle must hit in order 
to be scattered anywhere by integrating (53) over all directions of scattering, i.e. by 
integrating over all directions of the vector p’ with its magnitude kept fixed at P’, 
and then summing over all a’ that are to be taken into consideration, i.e. for which 
W' > mc*. This gives, with the help of (51), the result 


4nh2W°  b|(k|V|p°a°)|? 
2P9 (E—E,—a)? +b? 


(54) 


Loe 


If we suppose F to vary continuously through the value E;, the main variation 
of (53) or (54) will be due to the small denominator (EF — E,—a)?+b*. If we neglect 
the dependence of the other factors in (53) and (54) on E, then the maximum 
scattering will occur when EF has the value E;, + a and the scattering will be half 
its maximum when F differs from this value by an amount b. The large amount 
of scattering that occurs for values of the energy of the incident particle that 
make F nearly equal to Ex, give rise to the phenomenon of an absorption line. 
The centre of the line is displaced by an amount a from the resonance energy of 
the incident particle, i.e. the energy which would make the total energy just Ex, 
while the quantity b is what is sometimes called the half-width of the line. 


56. Emission and Absorption 


For studying emission and absorption we must consider non-stationary states 
of the system and must use the perturbation method of §47. To determine 
the coefficient of spontaneous emission we must take an initial state for which 
the particle is absorbed, so that the representative of the state is 


(k])=1, — (pal) = 0 


and determine the probability that at some later time the particle shall be on 
its way to infinity with a definite momentum. The method of 849 can now 
be applied. From the result (31) of that section we see that the probability per 
unit time per unit range of w and x of the particle being emitted in any direction 
w’, y’ with the scatterer being left in state a’ is 


2rh- |(k|V|W'w'y’a’)|?, (55) 


provided, of course, that a’ is such that the energy W’, given by (18), of the particle 
is greater than mc? For values of a’ that do not satisfy this condition there is 
no emission possible. The matrix element (k|V|W‘w’y’a’) here must refer 
to a representation in which W, w, x and a are diagonal with the weight 
function unity. The matrix elements of V appearing in the three preceding 
sections refer to a representation in which p,, py, p, are diagonal with the weight 
function unity, or P, w, y are diagonal with the weight function P?sinw. 
They would thus refer to a representation in which W, w, x are diagonal 
with the weight function (dP/dW)P?sinw = (WP/c?)sinw. Thus the matrix 
element (k|V|W/w'y’a’) in (55) is equal to ((W’P’/c?) sinw’)? times our previous 
matrix element (k|V|W'w'x‘a’) or (k|V|p’a’), so that (55) is equal to 


Q7r W'P'’ 


ae sinw’ |(k|V|p’a’)|?. 


half-width 
absorption line 


of 


198 IX. COLLISION PROBLEMS 


The probability of emission per unit solid angle per unit time, with the scatterer 
simultaneously dropping to state a’, is thus 


Qn W'P' 
h ¢ 


To obtain the total probability per unit time of the particle being emitted 
in any direction, with any final state for the scatterer, we must integrate (56) 
over all angles w’, y’ and sum over all states a’ whose energy H,(a’) is such that 
H,(a’)+mc? < Ex. The result is just 2b/h, where b is defined by (51). There is thus 
this simple relation between the total emission coefficient and the half-breadth b of 
the absorption line. 

Let us now consider absorption. This requires that we shall take an initial 
state for which the particle is certainly not absorbed but is incident with a definite 
momentum. Thus the representative of the initial state must be of the form (41). 
We must now determine the probability of the particle being absorbed after time 7’. 
Since our final state v;, is not one of a continuous range, we cannot use directly 
the result (31) of §49. If, however, we take 


(pa|)o = baa06(p — p”) (A\)o =0 (57) 


as the representative of the initial state, the analysis of 8847 and 49 is still 
applicable as far as equation (28) and shows us that the probability of the particle 
being absorbed into state w~, after time T' is 


I(k|V[p‘a’)’. (56) 


2 
2|(k|V|p°a°)|" [1 — cos{(E;, — E)T/A}]/(Ex — E)?. 
This corresponds to a distribution of incident particles of density h~*, owing to 
the omission of the factor h2 from (57), as compared with (41). The probability of 
there being an absorption after time 7’ when there is one incident particle crossing 
unit area per unit time is therefore 


Py yer) |(k|V |p°a”) |” [1 — cos{(E, — E)T/h}]/(E, — E)*. (58) 


To obtain the absorption coefficient we must consider the incident particles not 
all to have exactly the same energy W° = E — H,(a°), but to have a distribution 
of energy values about the correct value E, — H,(a°) required for absorption. 
If we take a beam of incident particles consisting of one crossing unit area per 
unit time per unit energy range, the probability of there being an absorption after 
time T will be given by the integral of (58) with respect to E. This integral may 
be evaluated in the same way as (29) of §49 and is equal to 


(47h? W°T cP) |(k|V|poa®)|”. 


56. Emission and Absorption 199 


The probability per unit time of an absorption taking place with an incident beam 
of one particle per unit area per unit time per unit energy range is therefore 


(4n7h?W°/c?P°) (k|V|p°a®)|?, (59) 


which is the absorption coefficient. 

The connexion between the absorption and emission coefficients (59) and (56) 
and the resonance scattering coefficients calculated in the preceding section should 
be noted. When the incident beam does not consist of particles all with the same 
energy, but consists of a unit distribution of particles per unit energy range crossing 
unit area per unit time, the total number of incident particles with energies near 
an absorption line that get scattered will be given by the integral of (54) with 
respect to E. If one neglects the dependence of the numerator of (54) on E, 
this integral will, since 


0 b 
| aoecon ae 


have just the value (59). Thus the total number of scattered particles in 
the neighbourhood of an absorption line is equal to the total number absorbed. 
We can therefore regard all these scattered particles as absorbed particles that are 
subsequently re-emitted in a different direction. Further, the number of particles 
in the neighbourhood of the absorption line that get scattered per unit solid angle 
about a given direction specified by p’ and then belong to scatterers in state a’ will 
be given by the integral with respect to E of (53), which integral has in the same 
way the value 
Ar*h?W°W'P' x 

er b | 
This is just equal to the absorption coefficient (59) multiplied by the emission 
coefficient (56) divided by 2b/h, the total emission coefficient. This is in agreement 
with the point of view of regarding the resonance scattered particles as those that 
are absorbed and then re-emitted, with the absorption and emission processes 
governed independently each by its own probability law, since this point of view 
would make the fraction of the total number of absorbed particles that are 
re-emitted in a unit solid angle about a given direction just the emission coefficient 
for this direction divided by the total emission coefficient. 


(p'a’|V|k)) |(k|V [p°a®) |’. 


X. SYSTEMS CONTAINING 
SEVERAL SIMILAR PARTICLES 


57. Symmetrical and Antisymmetrical States 


IF a system in atomic physics contains a number of particles of the same kind, 
e.g. a number of electrons, the particles are absolutely indistinguishable one from 
another. No observable change is made when two of them are interchanged. 
This circumstance gives rise to some curious phenomena in quantum mechanics 
having no analogue in the classical theory, which arise from the fact that in 
quantum mechanics a transition may occur resulting in merely the interchange 
of two similar particles, which transition then could not be detected by 
any observational means. A satisfactory theory ought, of course, to count 
two observationally indistinguishable states as the same state and to deny that 
any transition does occur when two similar particles exchange places. We shall 
find that such a theory can be developed in agreement with the principles of 
quantum mechanics. 

Suppose we have a system containing n similar particles. We may take 
as our dynamical variables a set of variables €, describing the first particle, 
the corresponding set €2 describing the second particle, and so on up to the set &, 
describing the nth particle. We shall then have the €,’s commuting with the €,’s 
for r#s. (We may require certain extra variables, describing what the system 
consists of in addition to the n similar particles, but it is not necessary to mention 
these explicitly in the present chapter.) The Hamiltonian describing the motion 
of the system will now be expressible as a function of the &, &9,..., &,. The fact 
that the particles are similar requires that the Hamiltonian shall be a symmetrical 
function of the &1, &,..., &, ie. it shall remain unchanged when the sets of 
variables €, are interchanged or permuted in any way. This condition must hold, 
no matter what perturbations are applied to the system. 

We may take a representation with observables q1, q2,..-, Gn diagonal, which are 
such that the q,’s are a complete set of commuting observables describing the first 
particle, the q2’s are the corresponding observables describing the second particle, 
and so on. We may further choose the phases of the representation in the same way 


200 


57. Symmetrical and Antisymmetrical States 201 


for each of the particles. (This means, for example, that if a certain momentum p, 
describing the first particle is represented by —ihO/Oq,, the corresponding 
momentum P, describing the rth particle must be represented by —ih0O/0q-.) 
The representation will then treat all the particles on the same footing. 
The condition that the Hamiltonian H is symmetrical between all the particles may 
now be expressed by the condition that its representative (q,q5...¢,|H|d/a5 ...g'), 
or (q|H|q") for brevity, is symmetrical between all the q’s, i.e. that it remains 
unchanged if any permutation is applied to the q’’s and the same permutation to 
the q’’s. This condition may be expressed analytically thus, 


(q'|H\q") = (Pq'|H|Pq"), (1) 


where P denotes any permutation of the numbers 1, 2,...,n and Pq’ denotes the set 
of numbers obtained by applying the permutation P to the suffixes of qj, q4,..., gi. 

Let (qiq5---¢,|) or (q’'|) be the wave function representing any state. It will 
satisfy the wave equation 


incall) = f(a Hla") da" (a). (2) 


If we apply any permutation P to the variables q’ in (q'|) we shall obtain a function 
(Pq'|) satisfying 


. d / / " " NW 
ihe (Pal) - [editia )dg’ (|) 


= fednird’) dq" (Pa"|); 


since we can apply any permutation to the variables of integration q”’ in 
the integrand without changing the value of the integral. With the help of (1) 
this becomes 


in (Pd) = [eine dq’ (Pq"|) (3) 


which shows that (Pq’|) is a solution of the wave equation (2). Hence if we apply 
any permutation to the variables in a solution of the wave equation we obtain 
another solution. 

Suppose we take a wave function (q’|) which, at some particular time f, 


is a symmetrical function of all the q’’s, so that 
(a'!) = (Pa'l) (4) 
for any P. The right-hand sides of (2) and (3) are now equal, so that 
d d 


a |) = qed): 


symmetrical 
function 


wave 


antisymmetrical 
wave function 


202 X. SYSTEMS CONTAINING SEVERAL SIMILAR PARTICLES 


This equation is the time derivative of (4) and shows that if (4) holds at one 
particular time it holds also at a slightly later time, and thus by induction 
it holds at all times. Thus if a wave function is initially symmetrical it always 
remains symmetrical. 

Similarly, we may take a wave function (q’|) which, at some particular time, 
is antisymmetrical, ie. (qiq,...¢,|) changes sign with interchange of any pair 
of q’’s. We shall then have 

(q'|) = +(Pdl), (5) 
the + or — sign being taken according to whether the permutation P is even or 
odd (i.e. according to whether P can be built up from an even or an odd number 
of simple interchanges). The same argument as before now shows that if a wave 
function is initially antisymmetrical it always remains antisymmetrical. 

Let us make a canonical transformation to a Q-representation which, 
like the original qg-representation, treats all the particles on the same footing. 
This means that the Q’s consist of corresponding sets of observables Qi, Qo,..., 
Qn describing the first, second, ..., nth particle respectively and that the phases 
are chosen in the same way for each of the particles. The transformation function 
will now, from (11) of Chapter V, be of the form 


(Q1Q2---Q,1099--- dn) = (Qi lai) (Qela5) --- (Qnldn): (6) 


in which each factor (Q/|qd.) is the same function of its variables Qi & qi. 
This condition gives, if we denote (Q1Q4...Q) ldig..-¢,) by (Q'|q’) for brevity, 


(Q'I1q') = (PQ'1Pq’), (7) 


for an arbitrary permutation P. The new wave function representing any state is 
given by 


Q) = fla) ad Gb. (8) 
From this equation we can deduce that 
(PQ) = | (Ped) da’ (a) 
= / (PQ'|Pq') dd (Pa'|) 
= fQia) ag (Pall (9 
with the help of (7). Now if (q'|) is symmetrical, so that equation (4) holds, 


the right-hand sides of (8) and (9) are equal. We then have (Q’|) = (PQ’), 
so that (Q’|) is also symmetrical. Similarly, if (q'|) is antisymmetrical, (Q’|) is 


57. Symmetrical and Antisymmetrical States 203 


also antisymmetrical. Thus the property of the representative of a state of 
being symmetrical or antisymmetrical remains invariant under a transformation 
of the coordinate system. ‘This invariance shows that the property of being 
symmetrical or antisymmetrical is a property of the states themselves and not 
merely a property of their representatives. Thus we can talk about symmetrical 
and antisymmetrical states. Our preceding result shows that if a state is initially 
symmetrical or antisymmetrical, it always remains so. 

The invariance and permanence of the symmetry properties of the states means 
that for some particular kind of particle it is quite possible for only symmetrical 
or only antisymmetrical states to occur in nature. Whether this is the case 
cannot be decided by any general theoretical considerations, but can be settled 
only by reference to special experimentally determined facts about the particles 
in question. For photons one can settle the question by making use of Planck’s 
radiation law. Only when one assumes the symmetrical states for photons does 
one get a statistical mechanics leading to Planck’s law for radiation in statistical 
equilibrium. This statistical mechanics is known as the Einstein-Bose statistics, 
as it was first introduced by Satyendra Nath Bose and Albert Einstein before 
the arrival of the modern quantum mechanics. 

For electrons we use the fact that, if we make the approximation of regarding 
the electrons in an atom as each moving in its own ‘orbit’ (ie. as being each 
describable by its own wave function involving only its own variables), then no 
two electrons will ever be in the same orbit. This fact, which is known as Pauli’s 
exclusion principle, may be inferred from general experimental evidence on atomic 
structure. Let us see how to fit it in with the theory. If the wave functions 
representing the different electronic orbits are 


(q'|a1), (q'|a2), pee, (q'|an), 


a wave function representing the whole atom will be given by the product 


(q'|a1)(q'laz) -.- (d'lan) = (q'le) (10) 
say, for brevity. Other wave functions representing the same distribution of 
electrons over the various orbits may be obtained by applying any permutation 
to the a’s in (10). There will be altogether n! such wave functions, the general one 
being (q’/|Pa). Any linear combination of these wave functions will also represent 
the same electron distribution. One such linear combination is the sum 


> (q'|Pa), (11) 


P 


which is symmetrical between all the q’’s. Another is 


S > +(q/|Po), (12) 


Einstein-Bose 


statistics 


exclusion principle 


Pauli’s 
principle 


exclusion 


permutations 


identical 
permutation 


reciprocal 
permutation 


204 X. SYSTEMS CONTAINING SEVERAL SIMILAR PARTICLES 


the+or—sign being taken according to whether P is an even or odd permutation, 
and this one is antisymmetrical. The antisymmetrical wave function (12) has 
the property that it vanishes identically if two of the a’s are equal. Hence if 
we assume that for electrons only antisymmetrical states occur, we shall get 
the result that there are no states with two electrons in the same orbit, which is 
just Pauli’s exclusion principle. This assumption is the only one we can make 
which will lead to Pauli’s exclusion principle. 

In this way we can see that for photons we must take the symmetrical states 
and for electrons the antisymmetrical states. When only the symmetrical or only 
the antisymmetrical states are allowed for a particular kind of particle, the theory 
can no longer make a distinction between two states which differ only through 
a permutation of the particles, so that the difficulties mentioned at the beginning 
of this section disappear. 


58. Permutations as Dynamical Variables 


Let us now build up a general theory for a system containing n similar particles 
when states with any kind of symmetry properties are allowed, i.e. when there is 
no restriction to only symmetrical or only antisymmetrical states. The general 
state now will not be symmetrical or antisymmetrical, nor will it be expressible 
linearly in terms of symmetrical and antisymmetrical states when n > 2. 

If P denotes any permutation and w any w-vector, we can give a meaning to Pw, 
the w-vector obtained by operating on w with P. We define Pw to be the w-vector 
whose representative is (Pq'|), obtained by applying the permutation P to the 
representative (q'|) of w. This Pw is independent of the representation used for 
defining it, as follows from equation (9). Further, the operation by which Pw 
is obtained from w is a linear one. Hence we can regard Pw as the product 
of a dynamical variable P with w, i.e. we can regard the permutation P as 
a dynamical variable. 

There are n! permutations, each of which can be regarded as a dynamical 
variable. One of them, P;, say, is the identical permutation, which is equal to 
unity. If w denotes a symmetrical state, we have 


Py= (13) 


for any P, and hence a symmetrical w is an eigen-w of every permutation belonging 
to the eigenvalue unity. Similarly, an antisymmetrical w is an eigen-w of every 
permutation belonging to the eigenvalue +1 according to whether the permutation 
is even or odd. The product of any two permutations is a third permutation and 
hence any function of the permutations is reducible to a linear function of them. 
Any permutation P has a reciprocal P~! satisfying 


58. Permutations as Dynamical Variables 205 


PP eRe SrA, 


A permutation P, like any other dynamical variable, can be represented by 
a matrix. Its q-representative (q'|P|q") will satisfy 


/ (q/|Pla”) dq" (q'|) = (Pa!) 
and hence 


(q'|P\q") = 6(Pq' — @") (14) 


CaP 9"): (15) 
The 6 function in (14) or (15) denotes the product of n factors of the type 


d({Pq'}, — ¢!) or 6(qd. — {P~+q"},) respectively. The conjugate complex of P 
is given by 


= 
=0 


(q'|P'\q") = (a"\Pla’) = 6(q" — Pd’) 
= (q|P"Iq") 


from (15) and (14), so that = 
Ra Pp, (16) 


Thus a permutation is not in general a real dynamical variable, its conjugate 
complex being equal to its reciprocal. 

Any permutation of the numbers 1, 2, 3, ..., m may be expressed in 
the cyclic notation, e.g. with n = 8 


P, = (148) (27) (58) (6), (17) 


in which each number is to be replaced by the succeeding number in a bracket, 
unless it is the last in a bracket, when it is to be replaced by the first in 
that bracket. Thus P, changes the numbers 12345678 into 47138625. The type of 
any permutation is specified by the partition of the number n which is provided by 
the number of numbers in each of the brackets. Thus the type of P, is specified by 
the partition 8 = 3+2+2-+1. Permutations of the same type, i.e. corresponding 
to the same partition, we shall call similar. Thus, for example, P, in (17) is similar 
to 

P, = (871)(35)(46) (2). (18) 


The whole of the n! possible permutations may be divided into sets of similar 
permutations, each such set being called a class. The permutation P,; = 1 forms 
a class by itself. Any permutation is similar to its reciprocal. 


similar 
permutations 


class 
permutations 


of 


206 X. SYSTEMS CONTAINING SEVERAL SIMILAR PARTICLES 


When two permutations P, and P, are similar, either of them, P, [say,] may 
be obtained by making a certain permutation P in the other P,. Thus, in our 
example (17), (18) we can take P to be the permutation that changes 14327586 
into 87135462, i.e. the permutation 


P = (18623)(475). 
We then have the algebraic relation between P, and P, 
PoP PPS (19) 


To verify this, we observe that the product P,w of P, with any w is changed into 
P,w if one applies the permutation P to the P, in the product but not to the w. 
If we multiply the product by P on the left, we are applying this permutation to 
the whole ~-symbol P,w and thus to both the P, and the w, so that we must insert 
another factor P~' between the P, and the w, giving us PP,P~+y to equate to Pw. 
An alternative proof consists in noting that when the permutation P is applied to 
the representative 6(P,q'—q") of P,, it gives 6(PP,q' — Pq") or 6(PP,P~'q' —q'"), 
which is just the representative of PP,P~+ 

Equation (19) is the general formula showing when two permutations P, and P, 
are similar. Of course P is not uniquely determined when P, and P, are given, 
but the existence of any P satisfying (19) is sufficient to show that P, and P, 
are similar. 


59. Permutations as Constants of the Motion 


Let us see how one of our permutation dynamical variables P varies with the time. 
The fact that the Hamiltonian is symmetrical leads at once to the equation 


PH = HP, (20) 
as may be verified by a similar argument to that used for equation (19), 


or alternatively by a direct application of the matrix representatives. 
Thus from (14) 


(q'|PAlq") = / 6(Pd —¢q") dq" (q" Ala") = (Pd Hq") 
and from (15) 
(q'|HP\q") = fava”) dq” d(q” _ Pg) = (q7|H|P~'q"), 


and the two right-hand sides are now equal from (1). Equation (20) shows that 
each permutation is a constant of the motion. The P’s are still constants when 


59. Permutations as Constants of the Motion 207 


arbitrary perturbations are applied to the system, provided the perturbing energy 
to be added to the Hamiltonian is symmetrical. Thus the constancy of the P’s 
is perfect. 

In dealing with any system in quantum mechanics, when we have found 
a constant of the motion a, we know that if for any state of motion, a initially has 
the numerical value a’, then it always has this value, so that we can assign different 
numbers a’ to the different states and so obtain a useful classification of the states. 
The procedure is not so straightforward, however, when we have several constants 
of the motion a which do not commute (as is the case with our permutations P), 
since we cannot assign numerical values for all the a’s simultaneously to any state. 
Let us first take the case of a system whose Hamiltonian does not involve the time 
explicitly. The existence of constants of the motion a which do not commute is 
then a sign that the system is degenerate. This is because, for a non-degenerate 
system, the Hamiltonian H by itself forms a complete set of commuting observables 
and hence, from the theorem at the top of page 56 each of the a’s is a function 
of H and therefore commutes with any other a. 

We must now look for a function (6 of the a’s which has one and the same 
numerical value 3’ for all those states belonging to one energy-level H’, so that 
we can use ( for classifying the energy-levels of the system. We can express 
the condition for 6 by saying that it must be a function of H and must therefore 
commute with every dynamical variable that commutes with H, i.e. with every 
constant of the motion. If the a’s are the only constants of the motion, or if 
they are a set that commute with all other independent constants of the motion, 
our problem reduces to finding a function 6 of the a’s which commutes with 
all the a’s. We can then assign a numerical value {’ for 3 to each energy-level 
of the system. If we can find several such functions 6, they must all commute 
with each other, so that we can give them all numerical values simultaneously 
and obtain a complete classification of the energy-levels. When the Hamiltonian 
involves the time explicitly one cannot talk about energy-levels, but the (’s will 
still give a useful classification for the states. 

We follow this method in dealing with our permutations P. We must find 
a function y of the P’s such that PyP~! = y for every P. It is evident that 
a possible y is }>P., the sum of all the permutations in a certain class c, 
i.e. the sum of a set of similar permutations, since yp must consist of 
the same permutations summed in a different order. There will be one such xy 
for each class, Further, there can be no other independent y, since an arbitrary 
function of the P’s can be expressed as a linear function of them with numerical 
coefficients, and it will not then commute with every P unless the coefficients of 
similar P’s are always the same. We thus obtain all the y’s that can be used for 
classifying the states. It is convenient to define each x as an average instead of 


exclusive 
states 


character 
group) 


sets 


(of 


of 


a 


208 X. SYSTEMS CONTAINING SEVERAL SIMILAR PARTICLES 


a sum, thus 
Xe = a S- a, 


where n, is the number of P’s in the class c. An alternative expression for x, is 


Kel PPPS (21) 
iP. 


the sum being extended over all the n! permutations P, it being easy to verify that 
this sum contains each member of the class c the same number of times. For each 
permutation P there is one y, y(P) say, equal to the average of all permutations 
similar to P. One of the y’s is y(P,) = 1. 

The constants of the motion x1, X2,..-,; Xm obtained in this way will each have 
a definite numerical value for every stationary state of the system, in the case 
when the Hamiltonian does not involve the time explicitly, and also in the general 
case can be used for classifying the states, there being one set of states for every 
permissible set of numerical values vy}, x4,---, Xj, for the y’s. Since the x’s are 
perfect constants of the motion, these sets of states will be exclusive, i.e. transitions 
will never take place from a state in one set to a state in another. 

The permissible sets of values y’ that one can give to the x’s are limited by 
the fact that there exist algebraic relations between the y’s. The product of any 
two x’S, XpXq is of course expressible as a linear function of the P’s, and since 
it commutes with every P it must be expressible as a linear function of the y’s, thus 


XpXq = 1X1 + G2K2 + +++ + AmXm, (22) 


where the a’s are numbers. Any numerical values x’ that one gives to the y’s must 
be eigenvalues of the x’s and must satisfy these same algebraic equations. For every 
solution y’ of these equations there is one exclusive set of states. One solution is 
evidently yj, = 1 for every xp, giving the set of symmetrical states satisfying (13). 
A second obvious solution, giving the set of antisymmetrical states, is yp = +1, 
the + or — sign being taken according to whether the permutations in the class p 
are even or odd. The other solutions may be worked out in any special case by 
ordinary algebraic methods, as the coefficients a in (22) may be obtained directly 
by a consideration of the types of permutation to which the y’s concerned refer. 
Any solution is, apart from a certain factor, what is called in group theory 
a character of the group of permutations. The y’s are all real dynamical variables, 
since each P and its conjugate complex P~' are similar and will occur added 
together in the definition of any y, so that the y’’s must be all real numbers. 
The number of possible solutions of the equations (22) may easily be 
determined, since it must equal the number of different eigenvalues of an arbitrary 
function B of the x’s. We can express B as a linear function of the y’s with 


209 


the help of equations (22); thus 


B= 1x1 + boxe + +++ + bmXm: (23) 
Similarly, we can express each of the quantities B?, B’..., B™ as a linear function 
of the y’s. From the m equations thus obtained, together with the equation 
x(P,) = 1, we can eliminate the m unknowns 1, \2, ---, Xm, obtaining as result 


an algebraic equation of degree m for B, 
B™ +B" + cB"? +++) +m = 0. 


The m solutions of this equation give the m possible eigenvalues for B, each of 
which will according to (23), be a linear function of b;, bo, ..., bm whose coefficients 
are a permissible set of values yj, v5,..-, x),,- These sets of values x’ thus obtained 
must be all different, since if there were fewer than m different permissible sets 
of values x’ for the y’s, there would exist a linear function of the y’s every one 
of whose eigenvalues vanishes, which would mean that the linear function itself 
vanishes and the x’s are not linearly independent. Thus the number of permissible 
sets of numerical values for the y’s is just equal to m, which is the number of 
classes of permutations or the number of partitions of n. This number is therefore 
the number of exclusive sets of states. 

The properties of the P’s which are not properties of the y’s will only describe 
the degeneracy of the states, in the case of a system whose Hamiltonian does 
not involve the time explicitly. If w represents any stationary state, f(P)w, 
where f(P) is any function of the permutations, will represent another stationary 
state belonging to the same energy-level, except when it vanishes identically. 
By expanding f(P)w in terms of a complete set of independent stationary states 
belonging to this energy-level, we get a representation of f(P) and thus of each P. 
In this way we see that, if we obtain a matrix representation of all the P’s 
consistent with each of the y’s being a certain number w, then the number of 
rows and columns of the matrices will be the degree of degeneracy of the states in 
the exclusive set y‘, i.e. states belonging to each energy-level. This degeneracy is 
an essential one and cannot be removed by any perturbation that is symmetrical 
between all the similar particles. The states w and f(P)w are observationally 
indistinguishable, since any observation that can actually be made must consist 
in measuring an observable that is symmetrical between the similar particles and 
therefore commutes with f(P). This remark applies also when the Hamiltonian 
involves the time explicitly. 


60. Determination of the Energy-levels 


Let us apply the perturbation method of 846 and make a first-order calculation 
of the energy-levels in the case when the Hamiltonian does not involve 


210 X. SYSTEMS CONTAINING SEVERAL SIMILAR PARTICLES 


the time explicitly. We suppose that for our unperturbed states each of the similar 
particles has its own ‘orbit’, represented by a wave function (q’|q@) involving only 
the coordinates q’ of this one particle. We shall have altogether n orbits, one for 
each particle, which we assume for the present to be all different, and label a1, a2, 
..+; Qn. The wave function representing an unperturbed state of the whole system 
will then be the product (10). If we apply an arbitrary permutation P, to the a’s, 
we shall obtain another wave function 


(dilar)(qlas) ... (qylae) = (q'|Paa), (24) 


representing another unperturbed state with the same energy. ‘There are thus 
altogether n! unperturbed states with this energy, if we assume there are no other 
causes of degeneracy. According to the method of §46 when the unperturbed 
system is degenerate, we must consider those elements of the matrix representing 
the perturbing energy V that refer to two states with the same energy, i.e. those of 
the type (P.a|V|P,a) where P, and P, are two permutations of the a’s. These will 
form a matrix with n! rows and columns, whose eigenvalues are the first-order 
corrections in the energy-levels. 

It is necessary in the present discussion to distinguish between the two kinds of 
permutations, those of the q’s and those of the a’s. The essential difference between 
them can perhaps be seen most clearly in the following way. Let us consider 
a permutation in the general sense, say that consisting of the interchange of 2 
and 3. This may be interpreted either as the interchange of the objects 2 and 3 
or as the interchange of the objects in the places 2 and 3, these two operations 
producing in general quite different results. The first of these interpretations 
is the one we have been using up to the present, the objects concerned being 
the q’s in the representative of a state. A permutation with this interpretation 
can be applied to an arbitrary function of the q’s. A permutation with the second 
interpretation has a meaning, however, when applied to a function of the q’s only 
if each of the q’s has a definite specifiable place in the function. This is not 
the case for a general function of the q’s, but it is the case for any of the n! 
functions of the type (24), the place of each q being specified by the a with which 
it is bracketed. Any permutation applied to the q’s in given places now produces 
the same result as the reciprocal permutation applied to the a’s. A permutation 
of the q’s (ie. one with the first interpretation), since it can be applied to any 
function of the q’s, i.e. to the representative of any state, may be regarded as 
an ordinary dynamical variable. On the other hand, a permutation of places or of 
the a’s can be considered as a dynamical variable only in a very restricted sense, 
since it has a meaning only when multiplied into a state whose representative is one 
of the n! wave functions (24) or some linear combination of them. We denote such 
a permutation of the a’s, considered as a dynamical variable in this restricted sense, 
by the symbol P* 


60. Determination of the Energy-levels 211 


We can form algebraic functions of the dynamical variables P® which will 
be other dynamical variables in the same restricted sense. In _ particular 
we can form x(P°), the average of all P®’s similar to P®. This must 
equal y(P,), the average of the similar permutations of the q’s, since the total 
set of all permutations of a given type must evidently be the same whether 
the permutations are applied to the objects qg or to the places a. 

If we set up arbitrarily a one-one correspondence between the q’s and the a’s, 
as is done automatically when we label both the q’s and the a’s by the numbers 
1, 2, 3,..., m, as in (10), then, if we have any permutation of the q’s, we can give 
a meaning to this same permutation of the a’s. This meaning is such that 


(gla) = (Pq|Pa). 


In this equation we can apply a permutation P, to the a’s on both sides, which will 
give us 
(q|Paa) = (Pq|P.Pa), (25) 


an equation which shows us the connexion between permutations of the q’s and 
those of the a’s when applied to the wave function (24). 

The matrix (P,a|V|P,a) which we must now study, may be obtained 
from the matrix (q'|V|q’) representing V by a coordinate transformation, 
in which the transformation functions are just (q'|P.a), the wave function (24), 
and its conjugate complex (P,alq'), provided these functions are properly 
normalized. Thus 


(PaalV|Poa) = ff (Paola!) dd (aVa") da" (a"'LPea). (26) 
Again, for arbitrary P, 
(P.PalV|PsPa) = ff (P.Pald) da! (d\V|d") da" (a'|PoPa) 
= | [ PaPo|Pd) ad (PalV[Pa') da! (Pa PsP) 


when we apply the permutation P to the variables of integration q’ and q”. 
With the help of (25), this reduces to 


(P,Pa|V|P,Pa) = \ i (P,alq’) dq (Pq'V|Pa") dq" (q"|Psa). (277) 


Now since V is symmetrical between all the particles, we must have 


(q'|V\q") = (Pq'|V|Pa"). 


212 X. SYSTEMS CONTAINING SEVERAL SIMILAR PARTICLES 


like (1), and hence, comparing (26) and (27), we obtain 
(P,a|V|P,a) = (P,Pa|V|P,Pa). (28) 
Let (P,a|V|a) = Vp for brevity. Then, taking P = P,' in (28), we obtain 
(P,0|V|P,x) = (P,P; ta|V|a) = Vp, p=: 


Thus the general matrix element (P,a|V|P,~) depends only on the ratio P,P, |, 
and of the total of (n!)? matrix elements there are only n! different ones. 
The coefficient of any Vp in the matrix will be a matrix, each of whose elements 
is 0 or 1, the 1 occurring when 


(P,a|V|P,a) = Vp, 


i.e. when Rp = P. But the latter matrix, multiplied into any wave function 
(q|P,a), gives the result (q|P.a) with P,P, ' = P, i.e. it gives the result (q|PP,a), 
so that it is precisely the matrix representing the dynamical variable P® or 
the permutation P applied to the a’s. Thus the whole matrix (P,a|V|P,qa) is 
equal to the matrix representing )> » VpP°, where the summation is over all the n! 
permutations P, and we can put 


V= >. Ver (29) 


This formula shows that the perturbing energy V is equal to a linear function 
of the permutation dynamical variables P® with numerical coefficients Vp. It is, of 
course, only an approximate formula, as it holds only with neglect of those matrix 
elements of V that refer to two different energy-levels of the unperturbed system. 
It can, however, be used for the calculation of the energy-levels in the first 
approximation, and is very convenient for this purpose as the expression )> » Vp P° 
is easily handled. This expression, it should be remembered, is a dynamical 
variable only in the restricted sense mentioned above, but this sense is sufficiently 
general for equation (29) to be valid with neglect of those matrix elements of V 
referring to two different energy-levels of the unperturbed system. 

As an example of an application of (29) we shall determine the average energy 
of all those states, arising from a given state of the unperturbed system, that belong 
to one exclusive set. This requires us to calculate the average eigenvalue of V when 
the y’s have specified numerical values y’. Now the average eigenvalue of P® equals 
that of P*P°(P°)~? for arbitrary P® and thus equals that of n!~' )>,, P®P2(P*)7}, 
which is y‘(P*) or x(P,). Hence the average eigenvalue of V is }>, Vpx‘(P). 
A similar method could be used for calculating the average eigenvalue of any 


213 


function of V, it being necessary only to replace each P® by y(P) to perform 
the averaging. 

The number of energy-levels in an exclusive set y = y’ that arise from a given 
state of the unperturbed system is equal to the number of eigenvalues of (29) that 
are consistent with the equations y = y’. This number is the number of rows 
and columns in a representation of the P’s in which each x = yx‘, which number, 
from the result at the end of the preceding section, is just the degree of degeneracy 
of the states in this set. 

The modifications required in the theory when the orbits a1, Q2,..., Qn of 
the undisturbed system are not all different may easily be made. Suppose, 
for example, that a, and a2 are the same. Then the permutation Pf, that causes 
an interchange of a; and a2 must equal unity. Only functions of the P®’s that 
commute with Ps, now have a meaning. This, however, is sufficient for us to be able 
to follow out the same sort of argument as before and obtain a result of the same 
form (29). The term in the summation in (29) that involves the permutation P?,, 
now does not occur, since it could be added on to the term involving the identical 
permutation PP. For the remaining terms, any two terms P® and P,* must have 
the same coefficient if the permutations P® and P;* can be obtained from one 
another by the interchange of a; and ag. This results in }>, VeP® commuting 
with Pj, and thus having a meaning. The condition Pf, = 1 imposes restrictions 
on the possible numerical values y’ that the y’s can have and reduces the number 
of characters. 


61. Application to Electrons 


Let us now consider the case when the similar particles are electrons. This requires, 
according to Pauli’s exclusion principle discussed in 857, that we take into account 
only the antisymmetrical states. It is now necessary to make explicit reference 
to the fact that electrons have spins, which show themselves through an angular 
momentum and a magnetic moment. The effect of the spin on the motion of 
an electron in an electromagnetic field is not very great. There are additional 
forces on the electron due to its magnetic moment, requiring additional terms in 
the Hamiltonian. The spin angular momentum does not here any direct action 
on the motion, but it comes into play when there are forces tending to rotate 
the magnetic moment, since the magnetic moment and angular momentum are 
constrained to be always in the same direction. These effects are all small, 
however, of the same order of magnitude as that of the relativistic variation of 
mass with velocity, so there would be no point in taking them into account in 
a non-relativistic theory. The importance of the spin lies not in these small effects 
on the motion of the electron, but in the fact that it gives two internal states to 
the electron, corresponding to the two possible values of the spin component in any 


214 X. SYSTEMS CONTAINING SEVERAL SIMILAR PARTICLES 


assigned direction, which causes a doubling in the number of independent states of 
an electron moving in a given field. This fact has far-reaching consequences when 
combined with Pauli’s exclusion principle. 

For the complete description of an electron we require the spin dynamical 
variables o, which were introduced in §19 and whose connexion with the spin 
angular momentum was given in §39, together with the Cartesian coordinates 
x, y, ~ and momenta p,, py, pz. The spin dynamical variables are assumed 
to commute with these coordinates and momenta. ‘Thus a complete set of 
commuting observables for a system consisting of a single electron will be 
X,Y, 2%, Oz. In a representation in which these are diagonal, the representative 
of any state will be a function of four variables 2’, y', z/, of. Since of has a domain 
consisting of only two points, namely 1 and —1, this function of four variables is 
the same as two functions of three variables, namely the two functions 


(a'y'2'|)4 =, Ge, y', z; tll), (ye) - ee y', a lh 


Thus the presence of the spin may be considered either as introducing a new variable 
into the representative of a state or as giving this representative two components. 
In our present work on the theory of several electrons, we shall consider 
the spins as giving extra variables in the representatives of states. For brevity, 
we shall write the single variable x, instead of x,, y,, z, for the coordinates 
of the r-th electron and shall omit the suffix z from o,, when it occurs in 
representatives. Thus the representative of a state when there are n electrons 

will be written 
(yey Ry O09 230%) |) = (xo); (30) 


for brevity. The exclusion principle requires that (30) shall be antisymmetrical 
in the x’s and o’s together, i.e. if any permutation is applied to the x’s and 
also to the o’s, (30) must remain unchanged or change sign according to whether 
the permutation is even or odd. In symbols 


(% 6) = FCP Pe) (31) 


for any permutation P. Thus even if we neglect the spin forces in the Hamiltonian, 
we must take the spin variables into account in order to determine what states are 
allowed by the exclusion principle. 

If the theory of the three preceding sections is applied directly to the case 
of electrons, it will not give anything of interest, since all the allowed states 
are eigenstates of any permutation belonging to the eigenvalue +1. We may, 
however, consider permutations P which operate on the x-variables alone in 
the representative of a state, and apply our theory to these. Such permutations 
may also bet considered as dynamical variables. Further, they are also constants 


61. Application to Electrons 215 


of the motion when we neglect the terms in the Hamiltonian that arise from 
the spin forces; since this neglect results in the Hamiltonian not involving the spin 
dynamical variables o at all. Hence with these permutations P we can again 
introduce the y’s, equal to the average of all of the P’s in each class, and assert 
that for any permissible set of numerical values x’ for the x’s there will be one 
exclusive set of states. Thus there exist these exclusive sets of states for systems 
containing many electrons even when we restrict ourselves to a consideration of 
only those states that satisfy Pauli’s principle. The exclusiveness of the sets of 
states is now, of course, only approximate, since the x’s are constants only so 
long as we neglect the spin forces. There will actually be a small probability for 
a transition from a state in one set to a state in another. 
From (31) we obtain 
PP? = +1, (32) 


where P denotes any permutation which operates on the x-variables and P? 
the same permutation operating on the o-variables in the representative of a state. 
There is thus a simple connexion between the P’s and P’’s, which means that 
instead of studying the dynamical variables P we can get all the results we want, 
e.g. the characters x’, by studying the dynamical variables P’. The P?’s are 
much easier to study on account of the fact that the o variables in the wave 
function have domains consisting each of only the two points 1 and —1, which are 
the two eigenvalues of each o,. This fact results in there being fewer characters ’ 
for the group of permutations of the o-variables than for the group of general 
permutations, since it prevents a function of the variables o1, 02,..., On from 
being antisymmetrical in more than two of them. 

The study of the dynamical variables P’ is made specially easy by the fact that 
we can express them as algebraic functions of the dynamical variables o. Consider 
the quantity 

O12 = 3{1 + (01, 02). 


With the help of equations (54) and (55) of §19 we find readily that 
(o1, o2)° = (021022 + Fy1Oy2 + O02)" = 3—2(01,02), (33) 


and hence that 
Of = 4{1 + 2(01, 02) + (1, 02)7} = 1. (34) 


. = 1 . . 
Again, we find Oj20e1 = {0x1 + Fx2 — 10210 y2 + 10 y1022}, 


hed 
072012 = 3{ 0x2 + Oz1 + 1 y1 9 22 — 101 0y2} 
and hence Oj20r1 = 072012. 


Similar relations hold for o,; and o,, so that we have 
O20) = 02012 


216 X. SYSTEMS CONTAINING SEVERAL SIMILAR PARTICLES 


or 01901035 = 09. 
From this we can obtain with the help of (34) 


These commutation relations for Oj. with ao; and a» are precisely the same as 
those for Pf,, the permutation consisting of the interchange of the spin variables 
of electrons 1 and 2. Thus we can put 


_ io 
Or. = CPi, 


where c is a number. Equation (34) shows that c = +1. To determine which 
of these values for c is the correct one, we observe that the eigenvalues of P7, 
are 1, 1,1, —1, corresponding to the fact that there exist three independent 
symmetrical and one antisymmetrical function of the two variables o.; & o.2, 
namely, with the notation of 819, the states represented by the three symmetrical 
functions fa(o:1) fa(o:2), fa(o21) fa(O:2), fa(o21) fa(o2) + fa(o1) fa(o2) and the one 
antisymmetrical uneven falta) fe(G%2) — fa(o1)fo(%2). Thus the mean of 
the eigenvalues of P7, is 4. Now the mean of the pigenyalles of (01, 02) is evidently 
zero and hence the mean of the eigenvalues of Oj. is 4. Thus we must have c = +1, 
and so we can put 
Pry = 3{1 + (01, 0)}. 

In this way any permutation P?’ consisting simply of an interchange can be 
expressed as an algebraic function of the o’s. Any other permutation P? can 
be expressed as a product of interchanges and can therefore also be expressed 
as a function of the o’s. With the help of (32) we can now express the P’s as 
algebraic functions of the o’s and eliminate the P’’s from the discussion. We have, 
since the — sign must be taken in (32) when the permutations are interchanges 
and since the square of an interchange is unity, 


P= —${1 + (01, 02)}. (35) 


The formula (35) may conveniently be used for the evaluation of 
the characters \’ which define the exclusive sets of states. We have, for example, 
for the permutations consisting of interchanges, 


X12 = X(Pi2) = -3 mn i) Slower} 


If we introduce the dynamical variable s to describe the magnitude of the total 
spin angular momentum, 3 )-.¢, in units of h, through the formula 


s(s +1) -( Daida) 


61. Application to Electrons 217 


in agreement with (12) of Chapter VII, we have 


2S \(o,,01) = x Or, a) — S (or, Or) 


- = 4s(s + 1) — 3n. ; 
al _4s(st1)—3n| _ _ n(n—4) +48(s + 1) 
Hence GV eal aie) \ = In(n— 1) ; (36) 


Thus y12 is expressible as a function of the dynamical variable s and of n 
the number of electrons. Any of the other y’s could be evaluated on similar 
lines and would have to be a function of s and n only, since there are no other 
symmetrical functions of all the o dynamical variables which could be involved. 
There is therefore one set of numerical values ,’ for the y’s, and thus one exclusive 
set of states, for each eigenvalue s’ of s. The eigenvalues of s are 


gn, 4n—1, gn-2, ..., 
the sequence? terminating with 0 or 4. 

We see in this way that each of the stationary states of a system with several 
electrons is an eigenstate of s, the magnitude in units of h of the total spin angular 
momentum 5 >>. o,, belonging to a definite eigenvalue s’. For any given s’ there will 
be 2s’ +1 possible values for a component of the total spin vector in any direction 
and these will correspond to 2s’ + 1 independent stationary states with the same 
energy. When we do not neglect the forces due to the spin magnetic moments 
these 2s’ + 1 states will in general be split up into 2s’ + 1 states with slightly 
different energies, and will thus form a multiplet of multiplicity 2s’+1. Transitions 
in which s’ changes, i.e. transitions from one multiplicity to another, cannot occur 
when the spin forces are neglected and will have only a small probability of 
occurrence when the spin forces are not neglected. 

We can determine the energy-levels of a system with several electrons to the first 
approximation by using formula (29). If we consider only the Coulomb forces 
between the electrons, then the interaction energy V will consist of a sum of 
parts each referring to only two electrons, which will result in all the matrix 
elements Vp vanishing except those for which P is the identical permutation or is 
simply an interchange of two electrons. Thus (29) will reduce to 


V=Vi+ > VisPS, (37) 
rT<s 


V,s; being the matrix element referring to the interchange of orbits r and s. 
Since the P®’s have the same properties as the P’s, any function of the P®’s 


4[‘sequence’ is used for ‘series’] 


multiplet 


218 X. SYSTEMS CONTAINING SEVERAL SIMILAR PARTICLES 


will have the same eigenvalues as the corresponding function of the P’s, so that 
the right-hand side of (37) will have the same eigenvalues as 


Vit > VeePrs 
r<s 
or Vi -4) Vz {1 ae (o,,05)} (38) 


r<s 


from (35). The eigenvalues of (38) will give the first-order corrections in 
the energy-levels. The form of (38) shows that a model which assumes a coupling 
energy between the spins of the various electrons, of magnitude —4V,.(0;, 0.5) for 
the electrons in the r and s orbits, would meet with a fair amount of success. 
This coupling energy is much greater than that of the spin magnetic moments. 
Such models of the atom were in use before the justification by quantum mechanics 
was obtained. 

If two of the orbits of our unperturbed system are the same, say the orbits a, 
and a, are the same, we must take only those eigenvalues of (37) that are consistent 
with Pf, = 1, or those eigenvalues of (38) consistent with Pig = 1 or PZ, = —1. 
This means we must take only those eigenvalues of (38) belonging to eigenfunctions 
that are simultaneously eigenfunctions of Pf, belonging to the eigenvalue —1, 
i.e. eigenfunctions that are antisymmetrical in 0, and og. Thus we may say that 
the two electrons in the orbits a; and a2 have their spins antiparallel. The case of 
more than two orbits the same cannot occur with electrons. 


XI. THEORY OF RADIATION 


62. Second Quantization 


WE shall begin this chapter by considering some general properties of an assembly 
of n similar systems of any kind that satisfy the Einstein-Bose statistics. 
If we take a representation in which sets of observables qi, q2,...- dn, describing 
the first, second,..., last system respectively, are diagonal, the representative 
(giq5---d,|) of any state must be symmetrical in the variables gq‘, @5,..., q,- 
Suppose the eigenvalues of any of the q’s, g, say, are gq, q?, q®..., 
which we assume for definiteness to be discrete. These eigenvalues must be 
the same for each of the systems, i.e. they must be independent of r. (They will 
each be in general a set of numbers, consisting of an eigenvalue of each of 
the set of commuting variables q,.) If we now have any symmetrical function 
of the variables qj, q5,...,¢,, each point in the domain of this function can 
be specified by n/,, 4, ,..., the numbers of q's equal to g™, q®, q,... 
respectively. The variables nj, 5,5, ... will do just as well as the variables 
di, Uhs---5 %,; So long as we are dealing only with symmetrical functions. 
Thus the representatives of states of our assembly satisfying the Einstein-Bose 
statistics may be expressed as functions of the variables n‘/,, n5, n5, ... instead of 
the variables qj, q5,.--, gj, This change is effectively a transformation to a new 
representation in which as rows and columns of the matrices are labelled by 
the observables 11, n2, n3,..., Which observables are the numbers of systems with 
q’s equal to q™, q, g®),... respectively, or, as we may say, the numbers of systems 
in the states q™, q), ¢® 

Since the new observables n,, no, n3,...are functions of the q, q2,---, Gn; 
(non-analytic functions, it is true) the transformation is of the trivial kind 
consisting essentially of a relabelling of the rows and columns, and the only change 
to be made in the representative of a state will be that arising from the change in 
the weights of the different points of its domain. To determine this change we use 


the condition 
So |(mne-.. P= S2 le... all, 


niN2... q192++-9n 


219 


220 XI. THEORY OF RADIATION 


from which we can infer that 


(rine... [P= So (ng --- anal), (1) 


the summation in (1) being over all values of the q’s such that n, of them are 
equal to g™, nz equal to g®’, and so on. The number of terms in the summation 
in (1) is n!/(my!ng!n3!...) and they are all equal, on account of (qi go ... n|) being 
symmetrical. It is thus clear that we must take 


(ny nq ...|) = [n!/(ny!ng!ng!... )]2(qi a2 --- dnl): (2) 


We must now obtain the transformation law for the representatives 
of dynamical variables from the g-representation to the n-representation. 
As this problem is rather complicated for a general dynamical variable, we shall 
here deal only with the special caseé when the dynamical variable is of the form 


USS Ue (3) 


U,. being a function only of the variables describing the r-th system and the form 
of U, in terms of these variables being the same for all r, so as to make U 
symmetrical between all the systems, as it must be if it is to have any physical 
significance. The representative of U, in the q,-representation will be (d/.|U,|q’), 
which will be a matrix independent of r, i.e. the same for each of the n systems. 
Its elements may also be written (q®|U|q) or Uz» for brevity. The representative 
of U in the complete q-representative will thus be 


(q19 racy qn lU |qi as sf as dn) = So (GU eld) 5a of Saat: . Oat salt 19d. yyaty: . Og! git: (4) 


r 


A convenient way of transforming this representative to the n-representation is 
to take the equation 

wg = UV (5) 
and transform the representative of this equation. From (4), this equation will be 
represented in the g-representation by 


(q192--- Gn|2) 
= Si (arlUrlar) (quae --- dnl) + ¥5 SS GrlUrl a) (Gude «+ -dr-10e drs «+= dnl) (6) 


Tr dp#Gr 


tThe general case has been dealt with by Pascual Jordan (1927) ,Uber Wellen und 
Korpuskeln in der Quantenmechanik“, Zeitschrift fiir Physik, 45 (11-12), pp. 766-775. 
doi:10.1007 /bf01329554 


62. Second Quantization 221 


the terms arising from the diagonal matrix elements of U being separated from 
the non-diagonal ones for convenience later. If we now make the transformation 
to the n-representation, using equation (2), equation (6) becomes 


(nyn2...|2) = S°(Gr|Urldr) (nana... |) 


=e, [Ss yy [(mq. + 1)/Mq,]4(qr|Urld,) (mane svetige = Lang: leis |1) (7) 
T g-#ar 


i 
2 


after removal of the factor [(ni!n2!ng!...)/n!]? throughout. The sum )>..(q,|U;|q-) 
in (7) means a sum of terms each of the type (q?|U|q¢@) or Uaa; 
the number of times this typical term occurs being the number of q’s 
that equal q®, which is just nj. Thus this sum is equal to oy al ca: 
Again, the double sum 7, >04-4,, in (7) consists of terms each of the type 
[(m5 + 1) /na)?U (nine 1..Ng —1...my+1...|) with 6 # a. The number of times 
this typical term occurs is equal to the number of ways of choosing r and df. 
such that gq, = q@ and qd. = q®. This is just nq, the number of ways of choosing r 
such that q, = q, since there is always just one way of choosing q/. = q°). 
Equation (7) thus reduces to 


(ning... |2) = > nVaa(mine -.- |) 


+S OSU ndk(ny + 1))Uas(nine...ma —1...m41...]), (8) 


a b4a 
which may be written 
(ning...|2) = So nk(m +1 — ban)8Uas(rana...Ma —1...my+1...]), (9) 
a,b 
if by (ning...mq — 1...ne + 1...|) when 6 = a we understand simply 
(as ad |\), 
The eigenvalues of each of our new dynamical variables n,, no, ... are 
the integers 0, 1, 2, 3, ..... They are thus the same, apart from the factor h, 


as those of the action variable J in the problem of the simple harmonic oscillator, 
when the arbitrary additive constant in this action variable is chosen as in 
equation (57) of §36. Hence each nz a dynamical variable of the same nature as 
the action variable of a simple harmonic oscillator and we can introduce an angle 
variable w, canonically conjugate to it, or rather we can introduce e””* and e~ "+, 
Corresponding to equations (59) of §36 we shall have 

een, = (ng = a 


10 
ean | =<, (nq i le ( ) 


Also we have that e’”’, e~*’« and ng commute with e””, e~*”> and nz for b ¥ a. 
b) b) 


222 XI. THEORY OF RADIATION 


The new dynamical variables e's [&] e~"’* are defined by their matrix 
representatives in a representation in which n, is diagonal, like the e” 
[&] e~”” of §36. From the form of these matrix representatives it follows that 
when es is multiplied into a ~-symbol whose representative is (njng...N«---|), 
the representative of the product is 


(nyng...Nq+1...]), 
and when e’”s, is multiplied into this ~-symbol, the representative of the product is 


(nyn2...Ng—1...|) for ng > 1, 
0 fori =. 


This means that when e~"”* and e’”« are multiplied into ~-symbols, they are 
equivalent to the operations of substitution of n,+1 and n,—1 for nq respectively, 
the second substitution being understood to give the result zero for ng = 0. We can 
now see that equation (9) is just the representative of 


be = S/ nb (my + 1 — ban) Uaree i. (11) 
a,b 


Equation (11) must hold whenever (5) holds and hence 


s— se n2 (np +1-— da) 2U ere 


a,b 
= Ss" nie™*U a (np + 1)2e~*™, (12) 
a,b 


with the help of (10). This gives us U in terms of the new variables n and 
their conjugates, and provides us immediately with the representative of U in 
the new representation. The U,, here are, of course, just numerical coefficients. 
We can put the result (12) into a simpler form by introducing 
the dynamical variables 
bq = (Mat 1)3e~™ = en Wand 
and their conjugate complexes (13) 
é, — give (te + 1)3 = nieive, 
These dynamical variables are of the form of (61) of Chapter VI and correspond, 


apart from numerical coefficients, to p—7q and p+7q in the problem of the simple 
harmonic oscillator. We have 


b4, = Ta a 1 
baka = Na- (14) 


62. Second Quantization 223 


and thus, since variables with different suffixes a commute, the complete set of 
quantum conditions for the €’s and &’s is 


ana: _ GpGe _ 0, 
Cube = Cube = 0; (15) 
babs ~~ ena = Oab- 
Expressed in terms of the €’s and €’s, equation (12) takes the simple form 
Sy Uae: (16) 
a,b 


We could carry through all the preceding work with reference to a different 


initial representation, say one in which the observables Qi, Qa,..., Qn 
describing the first, second,..., last system respectively are diagonal, instead of 
Ms Qe, If QM, QM,... are the eigenvalues of a Q, we should 
introduce observables ny, ng,... giving the numbers of systems with Q’s equal 
to QM), Q)),... respectively. Corresponding to these new n’s we should have 
new w’s, Say Wa, Wz,..- (defined only in exponentials like e"’4, e~"’4), and new €’s 
and &’s, say €4, &p,.... and &4, €g,... . The new equation (16) would read 
U=S_€, Uap ba, (17) 
A,B 
where Uap = (QM|U1Q) = YQ a) (Gla (a 10) 
a,b 
= Qala 12), (18) 
a,b 


(QM |q@) and (q\|\QM) being the transformation functions between the q’s 
and Q’s for a single system of the assembly. Equating the right-hand sides of (16) 
and (17) and using (18), we get 


YE Wests = Y2 EQ la Uasla|Q Ep. (19) 
a,b A, B,a,b 


Now U, in (3) can be an arbitrary function of the dynamical variables describing 
the r-th system and so the matrix elements U,» can be arbitrary. Since (19) holds 
with arbitrary U,,, we must have 


é= d bale (Qa) & = S7(q|QM és. (20) 


B 


These equations give the transformation laws for the €’s and &’s. 


second 
quantization 


224 XI. THEORY OF RADIATION 


Equations (20) show the existence of a remarkable analogy between 
the variables €,, which may all be regarded as forming one operator involving 
the parameter q, and the representative (q|) of a state of a single system of 
the assembly. These two functions of g have precisely the same transformation 
law under a passage from q to Q. Further, the interpretation (14) for €,€. 
is to some extent analogous to the ordinary physical interpretation of (gf. 
The analogy extends also to equations of motion. If we suppose the systems of 
our Einstein-Bose assembly to be all moving under the action of some external 
field of force, with no interaction between the systems, the total Hamiltonian 
for the assembly will be of the form of U in (3), where U, is the Hamiltonian 
for the r-th system alone, moving under the action of the external field of force. 
Taking U as Hamiltonian, we get as equations of motion for the €’s, from (16) and 
the quantum conditions (15). 


thE, = €,U —UE, 
=€,) €Unte— > EUabrea 


c,b c,b 


=) Uke. (21) 


This is of the same form as the Schrédinger wave equation for the r-th system 
alone with the Hamiltonian U,., €, playing the part of the wave function (qo). 

We have now established the general result that the transformation equations 
and equations of motion for the €’s describing an Einstein-Bose assembly of systems 
acted on by an external field of force may be obtained from the corresponding 
equations for the wave function describing a single system of the assembly by 
the application of a certain definite procedure, which is called second quantization. 
This consists in assuming that the wave function (q|) describing the single system 
is not a numerical function of the parameter q™, but is an operator for each q™, 
satisfying the quantum conditions (15). It then goes over into €, the form of 
its wave equation and transformation law remaining unaltered. 


63. Waves and Einstein-Bose Particles 


The theory of the preceding section provides the mathematical basis for 
the reconciliation of the wave and corpuscular pictures of light. It shows 
that an assembly of particles satisfying the Einstein-Bose statistics may be 
described by dynamical variables n, and their conjugates, which are formally 
the same as the action and angle variables describing simple harmonic oscillators. 
Thus an Einstein-Bose assembly is dynamically equivalent to a set of simple 
harmonic oscillators, there being one oscillator corresponding to each of a complete 


63. Waves and Einstein-Bose Particles 225 


set of independent states of a system of the assembly, the action variable of 
the oscillator corresponding to the number of systems in the state. We may 
replace the set of simple harmonic oscillators by a train of waves, each Fourier 
component of the waves being dynamically equivalent to a simple harmonic 
oscillator. We then see that our Einstein-Bose assembly is dynamically equivalent 
to a system of waves. Thus if we have any vibrating medium which we wish 
to deal with according to quantum mechanics, we may treat it either as a system 
of waves or as an assembly of Einstein-Bose particles, the two points of view being 
consistent and mathematically equivalent. 

We may apply our theory of Einstein-Bose assemblies to the case when 
the n,’s are all large, so that the €,’s are also large and we may neglect 
the 6,» on the right-hand side of (15). With this approximation our dynamical 
variables €, & €, all commute with each other and may be counted as numbers, 
and the equations of motion (21) become ordinary differential equations between 
numbers. These equations are now identical with the Schrédinger equation for 
a single one of the systems perturbed by the external field of force, the set of 
numbers €, playing the part of the wave function (q‘”|). If this wave function is 
normalized to n, it may be considered to represent an assembly of n independent 
systems in the way discussed in §51. The interpretation of the wave function, 
namely the interpretation of Ca as the probable number of systems in 
state g, now corresponds exactly to the interpretation of the €,’s provided by 
equation (14). We thus have the result that an assembly of a large number of 
similar systems is described by the same equations, whose solutions are to be 
interpreted in the same way, whether the systems are independent or satisfy 
the Einstein-Bose statistics. 

Since an assembly of independent systems and an assembly satisfying 
the Einstein-Bose statistics are two physically different things, it may seem strange 
that they are both to be described by the same set of equations, even though 
we are restricting ourselves to the limiting case of a large number of systems 
in the assembly. The solution of the paradox lies in the fact that there remains 
an essential difference between the mathematical treatments of the two assemblies, 
in spite of the similarities pointed out above, as may be seen from the following 
discussion. An assembly of independent systems is described as completely as 
quantum mechanics allows when we are given the number of systems in each state. 
The modulus of the wave function (q@|) is then determined for each state q(, 
but not its phase. This phase has no physical meaning. We must average over all 
values of this phase if it appears in the result of any calculation. On the other hand, 
for an assembly satisfying the Einstein-Bose statistics, the €,’s are dynamical 
variables and their phases as well as their moduli are observables. 


zero state 


226 XI. THEORY OF RADIATION 


There are two generalizations which we can easily make in the form of 
Hamiltonian which we had at the end of the preceding section for our Einstein-Bose 
assembly. Firstly, we may suppose that the various systems of the assembly 
are perturbed, not by an external field of force, but by interaction with some other 
atomic system, which we shall call the perturber. This will make a difference 
because the reaction of the assembly on the perturber will be taken into account. 
We must now introduce some more dynamical variables, § say, to describe 
the perturber. Our Hamiltonian will be of the form 


H=Hp+)_U,, (22) 


where Hp is the Hamiltonian that describes the perturber alone and U,. is 
the energy associated with the r-th system of the assembly, consisting of its proper 
energy plus its interaction energy with the perturber. Hp will be a function of 
the 6’s only and U, will be a function of the variables describing the rth system 
and also of the 3’s. We can express the new sum )>.U, in terms of the na 
& w, variables by the same method as before and the result will be of the same 
form (16), with the difference that the U.»’s will no longer be numbers but will 
be functions of the 6’s. The definition of U,», will now be that its representative 
in the ¢-representation, the ¢’s being any complete set of commuting observables 
taken out of the £’s, is 


(¢/Uasl6") = (G’g U1"), (23) 


the matrix on the right being the representative of U, in the representation in 
which q, and ¢ are diagonal. We shall still have U,, commuting with the n’s 
and e’”’s and e~”’s. The total Hamiltonian (22) will now be 


H=Hp+ )_€Uasbs. (24) 


ab 


The second generalization which we can make is to allow the total number of 
systems in the assembly to vary. This generalization is necessary when the theory 
is applied to photons, since any emission or absorption of a photon by an atomic 
system results in a change in the total number of photons in existence. We can 
get the theory for a varying number of systems from the theory for a fixed number 
by postulating in the latter theory a zero state for the systems, in which they are 
not physically observable in any way. Variations in the total number of observable 
systems can then be interpreted as arising from systems making transitions into 
or out of the zero state. We must suppose the number of systems in the zero 
state to be infinitely great, in order to allow the number of observable systems 
to increase without limit. 


227 


Using the suffix 0 to denote the zero state, we can write the Hamiltonian (24) as 


H = Hp + EUooto + ¥ EVa0go +d. EqUoses + Y EUasks, (25) 
a b 


a,b 


the value 0 for a or 6 being excluded from the summations in (25). We may 
assume Upg to be zero, since it has no physical meaning. Since no is infinitely great, 
&) and €, will also be infinitely great, of the order of ne, The terms involving £o 
and €) in the Hamiltonian (25), namely >>, €£,Uaofo and >>, €)Uos&s, are the terms 
which give rise to transitions into or out of the zero state and must be finite in 
order to lead to finite transition probabilities. This requires that Ugo and Up, shall 
be infinitely small in such a way that Uao€o and € Uo are finite. Put 


U0 = Va EUon = Vi, (26) 


so that V, and Y, are finite. The Hamiltonian (25) now becomes 


H=Hp+ S> E.Va + SoU + S > EUavés (27) 
a b 


a,b 


We may suppose the V, and Y, here to be, like the U,», functions only of 
the dynamical variables 6 describing the perturber and neglect their dependence on 
&) and €, given by (26), since the P.B. of & and € is infinitely small compared with 
&) and €, themselves, so that €) and €) may without error be counted as numbers. 


64. Application to Photons 


In Chapter IX a theory was given of the scattering, absorption and emission 
of a particle by an atomic system. The interaction of the particle and atomic 
system was assumed to be describable by an interaction energy V appearing in 
the Hamiltonian, which interaction energy had to be small but was otherwise 
arbitrary. If we could determine the energy of interaction between a photon and 
an atom or molecule, we could apply the methods of Chapter IX immediately to 
the case when the incident particle is a photon. We should then have a theory of 
the interaction of light with an atomic system. We cannot, however, determine 
this energy of interaction directly from analogy with the classical theory, in the way 
we obtained the Hamiltonians for most of the systems dealt with up to the present, 
since the phenomenon of the interaction of a photon with an atom has no analogue 
in the classical theory. We must proceed in a more indirect way. We know that 
the interaction of an atom with a field of radiation can be described approximately 
by classical electrodynamics when the field of radiation consists of a large number 
of photons. Our method is therefore to assume an arbitrary interaction energy V 


228 XI. THEORY OF RADIATION 


between a single photon and the atom and then in terms of V to investigate 
the interaction of a large number of photons with the atom. By comparing 
this interaction with that given by classical electrodynamics we can then obtain V. 

For investigating the interaction of a large number of photons with an atom 
we can use the foregoing theory of an Einstein-Bose assembly, taking the photons 
to be the systems of the assembly and the atom to be the perturber. Since there is 
no interaction between photons, the total energy will be of the form (22), Hp being 
the energy of the atom alone and U, the energy associated with the r-th photon, 
consisting of its proper energy hy, together with its energy of interaction, V,. say, 
with the atom. Thus 


U, = hvp + Vp. (28) 


It is convenient to take the variables gq to be constants of the motion for 
an unperturbed photon, so that the g™’s label the stationary states of the photons 
and the n,’s are the numbers of photons in the stationary states. This requires that 
the q’s for a photon shall specify its momentum and polarization. Let us introduce 
a vector k, equal to h~' times the momentum of a photon, and suppose the q’s for 
a photon to consist of its k together with a polarization variable. Then each value 
for the suffix a inn, and €,, denoting a stationary state of a photon, will correspond 
to a value for k, which we call kz, and a value for the polarization variable. 

For each value of k there are two independent stationary states, corresponding 
to the two independent states of polarization of a photon. We shall take these two 
independent stationary states to correspond to two perpendicular states of linear 
polarization. We ought now to verify that the theory remains invariant under 
a rotation of our standard directions of polarization. Calling the two original €’s for 
the two states for the given value of k, €; and £2, we obtain on rotating our standard 
directions of polarization through an angle 0, two new €’s, €} and €3 say, given by 


€} = €,cos#+ sind 
& = —&,sin 6 + €,cos0, 


since, as shown by equations (20), the transformation law for the €’s is the same as 
that for the states of polarization of a single photon. It can now easily be verified 
that if € and & satisfy the quantum conditions (15), then so do €} and 3. This is 
all that is necessary to establish the required invariance. We could alternatively 
work with the circular directions of polarization, which would mean using two €’s 
whose expressions in terms of the above ones are 2~2(€, + i€) and 273(& — if), 
which again satisfy the quantum conditions (15). 


64. Application to Photons 229 


Taking the representative of equation (28), we get 
Ua = hv ar Vb 


Vv, being the frequency of a photon in the stationary state a. The Vi»’s, 
like the U,»’s, are functions of the dynamical variables of the atom. 
The expression (27) for the Hamiltonian now becomes 


H = Hp+Hrp+ Hog, (29) 
where Hr = So nahva, (30) 


the total proper energy of the radiation, and 


Hg = S {EV + V aa} + Aas 
a a,b 
= S“{Vanke™ + Vi (ma + 1)2e7%*} + S- Vapnze™*(n, + 1), (31) 


a,b 


the total interaction energy. 

A photon has a continuous range of stationary states and not a discrete set, 
since the components of & may have any values from —oo to oo. We therefore 
ought to change the sums in (31) into integrals. To do this accurately 
would not be very easy, since it would mean dealing according to quantum 
mechanics with a dynamical system with continuously many degrees of freedom, 
which would require a new scheme of notation and a new mathematical technique. 
We are, however, interested in the interaction energy (31) mainly with regard to 
the limiting case of large n’s, when classical mechanics may be assumed to apply 
for the radiation, since we wish to compare the interaction energy in this case with 
that provided by classical electromagnetic theory and thus obtain expressions for 
the V,’s and Vy’s. In this limiting case the passage from sums to integrals is 
quite easy. 

Let s, denote the number of states of the photon (with a particular polarization) 
per unit of k-space about the value k,. We assume s, to be large, but an arbitrary 
function of k,, and investigate the limit of (31) when s, is made infinite. 
The number of photons (with a particular polarization) per unit of k-space about 


the value k, is Na = NaSa, (32) 


provided n, varies in some roughly continuous way from one state to the next. 
Let (a|V|b) be the matrix? representing the interaction energy V for one photon 


'The matrix elements of this matrix are actually functions of the dynamical variables 
describing the atom, like the V,,’s, and not numbers, but this does not invalidate the argument. 
The representation is an ‘incomplete’ one, whose representatives are defined in terms of those of 
a complete one by equations like (23). 


230 XI. THEORY OF RADIATION 


in the representation for one photon, when we use the normalization rule (23) 
of Chapter IV for the parameter k. This representation differs from the one 
we have used up to the present in this chapter, in which V is represented by Yo, 
through the factor s in the weight function, according to the work at the end 
of §24, so that the matrix elements in the two representations are connected by 


(a|V |b) = Va(Sasv)* (33) 


Similarly, the matrix elements (a|V|0), (0|V|a), referring to transitions into or out 
of the zero state, are connected with V, and V, by 


i 


(a|V|0) =Vast — (0|V|a) = Vast. 
We can now express the interaction energy (31) in the limiting case of large 
n’s, when the n’s may be assumed to commute with the e*”’s & e~™’s, in the form 


wa 3 —1Wa _ 3 4 i(Wa—w -1.- 
Ha =D {(alV0)npe** + (0|Va)nde™" sz" +S (alV[b)nange a) so 15,7 


a a,b 
= / {(alV|0)npe"™*+(0|V|a)nie *} dkat > ~ i | (alV |b) n2nzete—”) dkadky 
(34) 


in the limit s — oo, the sums in (34) referring only to the polarization. The fact 
that the s’s have disappeared from this result justifies our method of dealing with 
a continuous range of states as a limiting case of a discrete set. 


65. Determination of the Interaction Energy 


between a Photon and an Atom 


We shall now determine the matrix elements (a|V|0), (0|V|a) and (a|V|b) by 
comparing (34) with the classical expression for the interaction energy between 
an atom and a field of radiation. For simplicity we shall suppose the atom to consist 
of a single electron moving in an electrostatic field of force. The field of radiation 
may be described by a 4-vector potential. This potential is to a certain extent 
arbitrary and may be chosen so that its time component vanishes. The field is 
then completely described by a magnetic potential A,, A,, A, or A. The change 
that the field causes in the Hamiltonian describing the atom is now, as explained 
at the beginning of §43, 


1 ne e Cae 
i (P+ cA) -P \ Ree one 2) 


65. Determination of the Interaction Energy between a Photon and an Atom 231 


This is the classical interaction energy, which is to be compared with (34). 
The A that occurs here ought really to be the value of the magnetic potential 
at the point where the electron is momentarily situated. It is, however, a good 
enough approximation if we take this A to be the magnetic potential at some fixed 
point in the atom, such as the nucleus, provided we are dealing with radiation 
whose wave-length is large compared with the dimensions of the atom. 

To make the comparison between (34) and (35), we must first resolve the field 
of radiation into progressive plane-polarized trains of waves. The electric and 
magnetic fields of one of these trains of waves, whose frequency is v, and 
whose direction is specified by the vector ky, are of the form 


6. cos[27V4t = (Ka, x) + Yal; KH , cos|27Vat _ (Ka, x) oe el 


the amplitudes &, and #@, being vectors of equal length that are perpendicular to 
the direction of motion and to each other. The total electric and magnetic fields 
are expressible as Fourier integrals of the form 


é= S- [% cos[27Vat — (ka X) + Ya] dka, 
(36) 
KH = [#% cos[27Vqt — (ka, X) + Ya] dka, 


the 5>’s here meaning sums over both states of polarization for each value of k,. 
We must obtain the distribution of energy of this field over the various Fourier 
components. At time t = 0 we have 


[e dx = > [G40 dk, dk, [stn — (ky, x)] cos|y — (k,, x)] dx 
HS | f(b. 1) dk dk, 4*{05(40 +79) (ka + Ks) + 608(10 — H)6lkce — )} 


with the help of (15) of Chapter IV, the 5>’s here meaning sums over both states 
of polarization for each value of k, and k,. Thus 


[e dx = 4r° ye [iGo cos(Yq + Ya) dk, + 47° Se | éiAk. 
where the Fourier component specified by a’ is such that k,, = —k,. Similarly, 
fn dx = 4r? oS (20) cos(Ya + Ya’) dk, + 47° Ss [ dk,. 


On account of the connexion between the vectors &, and #, we have 6? = #2 
and also (6,,6y) = —(#1, 4’). Hence the total energy is 


= [er +se)ax =P | B2ak, (37) 


232 XI. THEORY OF RADIATION 


and the energy per unit of k-space for a definite polarization is 77@?. This may be 
equated to hvana, the 7 having the same meaning as in the preceding section. Thus 


6? = 1 *hugna. (38) 


The vector potential A may be expressed as a Fourier integral in the same way 
as 6 and #. We have 


A=-— S- i, A, sin[2avat — (ka, X) + Ya] dka, (39) 


the vector A, being in the same direction as &, and having its length given by 


2 2 
al (,<) 6!- es (40) 


QTV 4 At4Vq 


At the origin A will have the value 


-yo fA sin|27vyt + Ya] dk, = 3 | Aacos wa dk, 


W, being an angle variable of the same nature as those occurring in (34). This value 
for A substituted in expression ‘he for the interaction energy gives 


=> fi p, A,) cos Wa me Jf (A,, Ap) cos wy cos uw, dk, dk, 


h? a 
=— abet COS Wa dk, mil we ry 008 Oavn>ne cos w, cos w, dk, dky, 
m 
(41) 


Qr2m aa 


with the help of (40), where p, is the component of the momentum p of the electron 
in the direction of A, or @, and 0, is the angle between the vectors A, and Ap. 

If we write (41) in terms of ce” and e~”’ instead of cosw and compare it 
with (34), we obtain 


h? 
(a|V|0) = (0|V|a) = —;p, 
Ar?mvi? 
eh (42) 
(a|V|b) = ————, cos ,p. 


1604mv2v? 


We also find that there are certain terms in (41), namely those involving est) 
or e~ (a+) which have no corresponding terms in (34). This discrepancy shows 
an inadequacy of the assumption that the Hamiltonian describing the interaction 


233 


of an assembly of photons with an atom is of the form (22). The extra terms in (41) 
would give rise to transitions in which two photons are simultaneously absorbed 
or emitted and the possibility of such transitions requires a more complicated 
interaction energy than that assumed in (22) The physical effects of these terms are, 
however, small and unimportant, and so we shall neglect them. 

Equations (42) now give the interaction energy V between a single photon and 
the atom. This interaction energy cannot conveniently be expressed explicitly in 
terms of dynamical variables. In using (42) we may, without serious error, take for 
the momentum p of the electron its value when the atom is not perturbed by 
any radiation, namely mx. The left-hand sides of (42) are not ‘complete’ matrix 
elements, being functions of the dynamical variables of the atom, but we can obtain 
the ‘complete’ matrix elements from them by using formula (23). If the different 


stationary states of the atom alone are denoted by a’, a”,..., we shall have 
/ " / " eh} /| " 
(aa’|V|0a") = (0a'|V|aa") = z(a'|ta|a”’), (43) 
Ar?v2 
/ Nl e7h 
(aa’|V |ba") = ——— 08 band a'a- (44) 
16m4*mvave 


Each a or b here specifies a value for k, determining a momentum for the photon, 
and also a polarization variable determining a direction of electric force. 
The matrix element (a‘|%,|a”) is the component of the vector (a’|x|a”) in 
the direction of the electric force specified by a and similarly 0.) is the angle 
between the directions of electric force specified by a and b. 


66. Emission, Absorption and Scattering of 
Radiation 


We can now determine directly the coefficients of emission, absorption and 
scattering of radiation by substituting in the formulas of Chapter IX the values 
for the matrix elements given by (43) and (44). These matrix elements must 
first be corrected by the insertion of a factor h2 in (43) and h-3 in (44), 
owing to the different weight functions of the representation used in Chapter IX 
with the momentum of the incident particle labelling the representatives and 
the representation of §65 with k, equal to h-! times this momentum, labelling 
the representatives. 

For determining the emission probability we can use formula (56) of Chapter IX. 
This shows that for an atom in a state a’ the probability per unit time per unit 
solid angle of its spontaneously emitting a photon and dropping to a state a” of 


234 XI. THEORY OF RADIATION 


lower energy is 
4n?WPle 1 ey i 
45 
De ee i Onvyi ) i) 


Now the energy, W, and momentum, P, of a photon of frequency v are 


W = hp, P=hv/c. 
Again, from the Heisenberg law (15) of Chapter VI, 
(a"|egla”) = 2riv(a’, a”)(a'|xq|e"), 


v(a’, aw”) being the frequency connected with transitions from state a’ to state a”, 
which in the present case is just the frequency v of the emitted radiation. 
These results substituted in (45) make the emission coefficient reduce to 


(2Qrv)? 
he 


I(a’Jexala")|”. (46) 


To obtain the rate of emission of energy per unit solid angle for a specified 
polarization, we must multiply this by hv. This gives for the total rate of emission 
of energy in all directions 

4 (2rv)* 


5 MCa'lexla")?, (17) 


which is in agreement with expression (27) of Chapter VIII and justifies 
Heisenberg’s assumption for the interpretation of his matrix elements. 

In the same way the absorption coefficient, given by formula (59) of Chapter IX, 
becomes for photons 


4n*h?W le 1 2 _ Srey 


P 2 
2p h (2nv)3 (a’|a|a”) \(a"lexala") |". 


This absorption coefficient refers to an incident beam of one photon crossing unit 
area per unit time per unit energy range. If we take one per unit frequency range 
instead of energy range, as is usual when dealing with radiation, the absorption 
coefficient becomes 


lexala”)|’. 
This result is the same as (25) of §48, if we substitute for the E, there 


the energy hy of a single photon. Thus the elementary theory of 848, in which 
the radiation field is treated as an external perturbation, gives the correct value for 


66. Emission, Absorption and Scattering of Radiation 235 


the absorption coefficient. The average absorption for all directions of motion and 
of polarization of the incident beam is 


which is just equal to the emission coefficient (47) divided by the factor 8rhv3/c?. 
This ratio for the absorption and emission coefficients is in agreement with 
Einstein’s theory, discussed in §48. 

Let us now consider scattering. The true scattering coefficient is given by 
formula (38) of Chapter IX. Such scattering of photons will not be accompanied 
by any change of state of the atom on account of the factor dq” in the expression 
for the matrix element (aa’|V|ba”) in (44). Thus the final energy W’ of the photon 
will equal its initial energy W°. The scattering coefficient now reduces to 


(e*/m?c*) cos? Bap. 


This is the same as that given by classical mechanics for the scattering of radiation 
by a free electron. We thus see that the true scattering of radiation by an electron in 
an atom is independent of the atom and is correctly given by the classical theory. 
This result, it should be remembered, holds only provided the wave-length of 
the radiation is large compared with the dimensions of the atom. 

The true scattering is a mathematical concept and cannot be separated out 
experimentally from the total scattering, given by formula (44) of Chapter IX. 
Let us see what this total scattering is in the case of photons. A modification must 
now be made in the application of formula (44) of Chapter IX. The summation }°, 
in this formula may be considered as representing the contribution to the scattering 
of double transitions consisting of transitions firstly from the initial state to 
state k and secondly from state k to the final state. The first transition may be 
an absorption of the incident photon and the second an emission of the required 
scattered photon, but it is also possible for the first transition to be the emission 
and the second the absorption. It is clear from the general nature of the method 
used for deriving formula (44) of Chapter IX that both these kinds of double 
transitions must be included in the summation 5°, when this formula is applied 
to photons, although only the first of them was taken into account in the actual 
derivation given in Chapter IX. 

For the double transition of absorption followed by emission we must take, 
using zero, single prime and double prime to refer to the initial, final and 
intermediate k states of the atom, respectively, and a and b to refer to the absorbed 
and emitted photons respectively, 


(k|V|aa’) = (0a”|V aa?) (ba’|V|k) = (ba’|V 0a") 
E — Ey = hv + Hp(a®) — Hp(a”) = h[v® — v(a", a), 


236 XI. THEORY OF RADIATION 


where v’” is the frequency of the incident photon and 
hv(a", a®) = Hp(a") — Hp(a®). 
Similarly, for the double transition of emission followed by absorption we must take 
(k|V jaa’) = (ba |V 0a") (ba'|V|k) = (0a"|V jaa”) 
E — Ey = hv? + Hp(a®) — Hp(a") — hv® — hv’ = —hAlp’ +(e", a], 


where v’ is the frequency of the scattered photon, there being now two photons, 
of frequencies v° and v’, in existence for the intermediate state k. The expression 
for the scattering coefficient now reduces to 


2 
a LATIN oe Rg ca pee CA 
h2c4 vy? |m = y® — y(a’, a®) v' + v(a', a®) 
(48) 
If we write (48) in terms of x instead of «, we get 
(Q2ne)* | fh 
OR a eee COS O4H0 Q%q! 
2 
rom cn oy f (elele”")(a"|rala®) — (a'|zala”’)(a"|ty|a°) 
2 eee) { vo—vi(aa®) ou + v(a’, 2°) 
(49) 


We can simplify (49) with the help of the quantum conditions. We have 
LpLg — Lgty = O, 
which gives 
Y_ {(0'wo]0")(a""xaha°) — (a! |xala”)(0"|ars|a°)} = 0, (50) 


and also 
Lote — LaLy = (1/M)(Lopa — PaXy) = (th/m) cos O,p, 
which gives 


de {(a!|ela")u(a", a°)(al"ala®) — v(a', a"”)(a|arala"")(al"|xo|a”)} 


all 


1 th h 
= esis COS G4 p0Q%! = 
27m 


nim COS Dabda%a!: (51) 


237 
Multiplying (50) by v’ and adding to (51), we obtain 


D2 {(e'lwola”)(a""zala®)[v’ + v(a", a°)] — (a! |xala”")(al"|zp]a°)[/ + v(a’, a”))} 


all 


= (h/27m) cos Oanbava'- 
If we substitute this expression for (h/2mm) cos Oq.da0q in (49), we obtain, 
after a straightforward reduction making use of identical relations between the v’s, 


> {eee 7 ered | 


ae es 
6 


(52) 


a y® — v(a", a) v' + v(al, a) 
This gives the scattering coefficient in the form of the effective area that a photon 
has to hit per unit solid angle of scattering. It is known as the Kramers-Heisenberg 
dispersion formula, having been first obtained by these authors from analogies with Kramers- 
the classical theory of dispersion. Heisenberg 
The fact that the various terms in (49) can be combined to give dispersion formula 
the result (52) justifies the assumption made in deriving formula (44) of 
Chapter IX, that the matrix elements (aa’|V|ba”) of the interaction energy are 
of the second order of smallness compared with the (aa’|V|k) ones, at any rate 
when the scattered particles are photons. 


67. Einstein’s Laws of Radiation 


In the preceding section we determined the probability coefficients for absorption, 
emission and scattering of a photon by an atom. We were there concerned with only 
a single photon interacting with the atom (or at most with two), the interaction 
energy being given by (43) and (44). To complete our theory of radiation 
we require to know the laws governing the interaction of a number of photons 
with the atom. If the atom is exposed to an incident beam of radiation containing 
many photons, how do the absorption, emission and scattering probabilities depend 
on the intensity of the beam? 

This question cannot, of course, be answered simply from a consideration of 
the interaction energy, defined by (43) and (44), for a single photon. We have 
to rely? on the general interaction energy (31) for a number of photons. We shall 
make use of the general result (31) of §49, according to which a transition 
probability is proportional to the square of the modulus of the matrix element 
of the perturbing energy that refers to this transition. 

Let us consider an absorption process in which the number of photons in state a 
is reduced from n, to ng — 1, the atom simultaneously jumping from state a° to 


'Original:- ‘fall back’| 


238 XI. THEORY OF RADIATION 


state a’. The probability of such a process will be proportional to the square of 
the modulus of the matrix element 


(nyng...Nq...a°|Hg|ning...mq —1...a’) 


of the total interaction energy Hg. The only term in the expression (31) for Hg 
which can contribute to this matrix element is Vani ete, This matrix element 
is thus proportional to n? and the transition probability is proportional to ng, 
the number of photons in the state from which the absorption takes place. It follows 
that the probability of an absorption process is proportional to the intensity of 
the incident radiation. 

Similarly, for an emission process, in which the number of photons in state a 
is increased from ng to ng +1, we must consider the matrix element 


(ning...Ng...a°|Hg|nine...ng +1...0’). 


The only term in expression (31) that contributes to this is Va(na + 1)2e7 te, 
This matrix element is thus proportional to (ng +1)? and the transition probability 
to nq +1. In the same way a scattering process, in which the number of photons 
in state a is decreased from n, to n, — 1 and the number in state b is increased 
from ny to ny +1, is due to the term V,,n2e~-“(ny+1)2e-™, if it is a true scattering 
process. and to the product of the two terms V,n2e%s and V,(np) + 1)2e7™™, 
if otherwise. The scattering probability is thus in any case proportional 
to na(n» +1). To interpret these results, we can regard a proportionality to 
an n as a proportionality to the intensity of the corresponding beam of radiation, 
but a proportionality to an (n + 1) can be understood only from a study of 
the connexion between the discrete photon states which we are here using and 
the actual continuous range of states which these discrete states replace. 

The work at the end of §37 shows that a discrete state must be counted as 
a volume h® of phase space for the photon. Thus a number n, of photons in 
a discrete state is to be counted as a distribution of h~°n, photons per unit volume 


per unit of momentum space, or c~3v?n, per unit volume per unit solid angle per 


unit frequency range. This corresponds to an energy density of hc~?v2nq per unit 


solid angle per unit frequency range, or to an intensity 
I, = (hv? /c?\nq 


per unit frequency range. 

The probability for an emission process, which we found was proportional 
to ng +1, is thus proportional to I, + hv3/c?. This means that with no incident 
radiation there is still a certain amount of emission (which is, in fact, given by 
expression (46)), but that the emission is increased or stimulated by incident 


67. Einstein’s Laws of Radiation 239 


radiation in the same direction and having the same frequency (and state of 
polarization) as the emitted radiation under consideration. Our present theory 
of radiation thus completes the imperfect one of §48, and gives a ratio for 
the stimulated and spontaneous emissions which is in agreement with Einstein’s 
theory of thermodynamic equilibrium mentioned in §48. 

The probability for a scattering process from state a to state b, which we found 
was proportional to n,(n») +1), is in the same way proportional to [,(I, + hv?/c?). 
Thus the scattering of radiation is also stimulated by incident radiation in 
the same direction and having the same frequency as the scattered radiation. 
The stimulation phenomenon is, in fact, a general one, as has been shown by 
Einstein and Ehrenfest? from general statistical arguments. 


tA. Einstein; P. Ehrenfest (1923). ,,Zur Quantentheorie des Strahlungsgleichgewichts“ 
Zeitschrift fiir Physik 19(1), 301-306. doi:10.1007/bf01327565 See also Pauli, W. ,Uber das 
thermische Gleichgewicht zwischen Strahlung und freien Elektronen“. Zeitschift fiir Physik 18, 
272-286 (1923). doiz10.1007/BF01327708 


XH. RELATIVISTIC THEORY OF 
THE ELECTRON 


68. Relativistic Treatment of a Particle 


THE theory we have been building up and applying from Chapter I] onwards 
is essentially a non-relativistic one. We have been working all the time with 
one particular Lorentz frame of reference and have not made it a requirement 
of the theory that its results should be independent of this frame of reference. 
The theory was established as an analogue of the classical non-relativistic 
dynamics. Let us now try to make it relativistic. 

In the first place we note that the general principle of superposition of states, 
as given in Chapter I, is a relativistic principle. It applies to ‘states’ with 
the relativistic space-time meaning. Beyond this, though, the theory does not 
lend itself very well to relativistic treatment, owing to the fundamental notion 
of an ‘observable’ not fitting in very well with the requirements of relativity. 
The measurement of an observable, in the theory we have been dealing with 
up to the present, has always consisted in the measurement of some dynamical 
variable at some instant of time in some Lorentz frame of reference and there 
does not seem to be any way of generalizing this notion of an observable to make 
it cease to refer to a particular Lorentz frame. In consequence one cannot set 
up a general scheme of relativistic quantum mechanics like that of Chapter II 
for the non-relativistic theory. All one can do is to solve special problems in 
a Lorentz-invariant way. This should not be regarded as a defect of the quantum 
theory, since it is in perfect analogy with the classical theory. Relativistic classical 
mechanics does not involve any such general scheme as the contact transformation 
of non-relativistic classical mechanics, but consists in the solution of comparatively 
special problems. 

One of the special problems that can be handled relativistically is that of 
the motion of a particle in an external field of force. Our non-relativistic 
quantum mechanics applied to this problem can be made to take a relativistic 
form merely by a slight change of notation. We use the representation in which 
the coordinates of the particle are diagonal, so that the representative of a state 


240 


241 


is (xyz|), and adopt the Schrédinger picture, so that this representative varies 
with the time ¢ according to Schrédinger’s wave equation. If we now insert 
the variable t explicitly in the wave function (xyz|), so that it reads (xyzt]|), we can 
regard the wave function as a relativistic thing involving the four variables x, 
y, z, t on the same footing. Such relativistic wave functions form the basis of 
the present theory. The w-symbols will now be used for the symbolic writing of 
these relativistic wave functions and not of functions of x, y, z only. 

The important differential operators that can operate on the w’s of the present 
theory are those representing the components of momentum 


mae _ O oid 
Pr “hs, Py ta Pz thao (1) 
and a further one W= ne (2) 
Ot 


representing the energy. Note the difference in the sign in (1) and (2), a difference 
which is required by relativity. The operators in (1) and (2) cannot be interpreted 
as observables with the same degree of generality as the operators of non-relativistic 
quantum mechanics, since when one of the former operates on a w representing 
a state? that actually occurs in nature and thus satisfying the wave equation, 
the resulting function will not in general satisfy the wave equation and will thus 
not represent any actual state? An exception to this occurs when the momentum or 
energy is a constant of the motion, and such exceptions are the important practical 
cases when a measurement of momentum or energy is required. 


69. The Wave Equation for the Electron 


Let us consider first the case of the motion of an electron in the absence of 
an electromagnetic field, so that the problem is simply that of the free particle, 
which was discussed in §35. The Hamiltonian provided by classical mechanics for 
this system is given by equation (38) of §35, and leads to the wave equation 


Zi 
2 


{Wie - (mic? + 2 + pi, + p2)*} d = 0, (3) 


where W and the p’s are to be interpreted as operators in accordance with 
equations (1) and (2). Equation (3), although it takes into account correctly 
the variation of the mass of the particle with its velocity, is yet unsatisfactory 
from the point of view of relativity, because it is very unsymmetrical between W 
and the p’s, so much so that one cannot generalize it in a relativistic way to the case 


tThe word ‘state’ is here used in the relativistic space-time sense. 


242 XII. RELATIVISTIC THEORY OF THE ELECTRON 


when there is a field present. We must therefore look for a new wave equation for 
the free particle. 
If we multiply the wave equation (3) on the left by the operator 
a 


{W/c — (mc? + p2 + p? + p?) i: we obtain the equation 


{W?/c — mc? — p) — p, — pa} =0, (4) 


which is of a relativistically invariant form and more conveniently be taken as 
the basis of a relativistic theory. Equation (4) is not completely equivalent 
to equation (3) since, although every solution of (3) is also a solution of (4), 
the converse is not true. Only those solutions of (4) belonging to positive values 
for W are also solutions of (3). 

The wave equation (4) is not of the form required by the general laws 
of the quantum theory on account of its being quadratic in W. In §31 
we deduced from quite general arguments that the wave equation must be linear in 
the operator 0/0t or W, like equation (3) of that section. We therefore seek a wave 
equation that is linear in W and that is roughly equivalent to (4). In order that 
this wave equation shall transform in a simple way under a Lorentz transformation, 
we try to arrange that it shall be rational and linear in p,, p,, and p, as well as 
in W, and thus of the form 


{W/c+ QrDy + AyPy + a:pz + B}y =0, (5) 


where the a’s and £ are independent of W and the p’s. Since we are considering 
the case of no field, all points in space-time must be equivalent, so that the operator 
in the wave equation must not involve x, y, z or t. Thus the a’s and (@ must also 
be independent of x, y, z and t. They must therefore denote some quite new 
dynamical variables, which may be pictured as describing some internal motion in 
the electron. We shall see later that they just describe the spin of the electron. 
The a’s and (£ must, of course, commute with W and the p’s and also with z, y, 
z and t. 

Multiplying (5) by the operator {W/c — a,p, — QyPpy — azpz — 2} on the left, 
we obtain 


fe — SO [a2p? + (aeay + O02) PePy + (O28 + Bax) Pe] — a| p= 
Lyz 
This is the same as (4) if the a’s and § satisfy the relations 


» AgQy + AyA, = 0, 
2 = 
mc, AzB+ Paz = 0, 


69. The Wave Equation for the Electron 243 


together with the relations obtained from these by permuting x, y, and z. 
If we write 
8 =anme, 


these relations may be summed up in the single one, 
Cypser O60 = 20ng> Yh, Sa) Yee OL): (6) 


The four a’s all anticommute with one another and the square of each is unity. 

Thus by giving suitable properties to the a’s and 6 we can make the wave 
equation (5) equivalent to (4), in so far as the motion of the electron as a whole 
is concerned. We may now assume (5) is the correct relativistic wave equation 
for the motion of an electron in the absence of a field. This gives rise to one 
difficulty, however, owing to the fact that (5), like (4), is not exactly equivalent 
to (3), but allows solutions corresponding to negative as well as positive values 
of W. The former do not, of course, correspond to any actually observable motion 
of an electron. For the present we shall consider only the positive-energy solutions 
and shall leave the discussion of the negative-energy ones to 875. 

We can easily obtain a representation of the four a’s. They have similar 
algebraic properties to the o’s introduced in §19, which o’s can be represented by 
matrices with two rows and columns. So long as we keep to matrices with two rows 
and columns we cannot get a representation of more than three anticommuting 
quantities, and we have to go to four rows and columns to get a representation of 
the four anticommuting a’s. It is convenient first to express the a’s in terms of 
the o’s and also of a second similar set of three anticommuting variables whose 
squares are unity, ~1, P2, P3 Say, that are independent of and commute with the a’s. 
We may take, amongst other possibilities, 


As = Pi0n, “Ay = Pidy, O2= Piz, Ain = pa, (7) 


and the a’s will then satisfy all the relations (6), as may easily be verified. If we now 
take a representation with p3 and o, diagonal, we shall get the following scheme 
of matrices: 


0100 0 -i 0 0 20°10" :.0 
Pee fe ay ey eae eV (CC 
E00 ee ee O WO LO Sele Ae OS Oe St Oni 

0010 00 i: 0 0 0. 0-2 

O0;1'0 00 -i 0 10 0 0 

0001 00 0 -i 01 0 0 
BE Vd: Oa tO oye ee oe JOS Fe ee WN is EE 20 

0100 0% 0 0 00 0 -1 


electric density 


244 XII. RELATIVISTIC THEORY OF THE ELECTRON 


Corresponding to the four rows and columns there are four independent kets, 
so that the wave function will have four components. We saw in §61 that the spin 
of the electron requires the wave function to have two components. The fact 
that our present theory gives four is due to our wave equation (5) having twice 
as many solutions as it ought to have, half of them corresponding to states of 
negative energy. 

With the help of (7), the wave equation (5) may be written in the vector form 


{W/c+ pila, p) + psme} yp = 0. (8) 


To generalize this equation to the case when there is an electromagnetic 
field present, we follow the classical rule of replacing W and p by W +eAo 
and p+ (e/c)A, Ao and A being the scalar and vector potentials of the field 
at the place where the electron is. This gives us the equation 


WwW 
i= iS Ant PI (<, pt <A) is prc} y=0. (9) 
co C 


which is the fundamental wave equation of the relativistic theory of the electron. 
The conjugate imaginary equation 


WwW 
bf edo ten (o, p+ <A) + parc =0 (10) 
C ce C 


must be treated on the same footing as (9). The operators W and p in (10), 
which operate to the left, must be interpreted, according to §27, as having 
the meanings in equations (1) and (2) with the signs reversed. 


70. Invariance under a Lorentz Transformation 


Before proceeding to discuss the physical consequences of the wave equation (9) 
or (10), we shall first verify that our theory really is invariant under a Lorentz 
transformation, or, stated more accurately, that the physical results the theory 
leads to are independent of the Lorentz frame of reference used. This is not by any 
means obvious from the form of the wave equation (9). We have to verify that, 
if we write down the wave equation in a different Lorentz frame, the solutions 
of the new wave equation may be put into one-one correspondence with those of 
the original one in such a way that corresponding solutions may be assumed to 
represent the same state. For either Lorentz frame, the square of the modulus of 
the wave function, summed for the four components, should give the probability 
per unit volume of the electron being at any given place in that Lorentz frame. 
This probability is of the nature of an electric density (and will be called the electric 
density in future, for brevity), and its values, calculated in different Lorentz 


70. Invariance under a Lorentz Transformation 245 


frames for wave functions representing the same state, should be connected like 
the time components in these frames of some 4-vector. Further, the 4-dimensional 
divergence of this 4-vector should vanish, signifying conservation of charge, or that 
the electron cannot appear or disappear in any volume without passing through 
the boundary. 

For discussing Lorentz transformations it is convenient to put po for W/c and 
to make the convention that terms containing a repeated suffix are to be summed 
over the values 0, x, y, z for that suffix. This enables us to write equation (9) in 
the form 

{a4 (Pu + (€/)Au) + amme} wb = 0, (11) 


ao being equal to unity, and similarly we can write equation (10) in the form 
a) {Qp (Py ae (e/c)A,) + Amme} = 0. (12) 


We now apply a Lorentz transformation and denote quantities referring to 
the new frame by a star. The components of the 4-vectors p and A will transform 
according to a linear law of the type 


Pu = AwwPy, Ay, aa Giga; (13) 
Substituting these expressions for p, and A,, in equations (11) and (12), we obtain 


(14) 


{QA (py + (e/c) AZ) ammce} p = 0 
and }{ Apu (p; + (e/c) AZ) Game} = 0. 


We now try to bring these equations back to the form of the original (11) and (12) 
by introducing a new wave function ~*, whose four components are linear functions 
(with constant numerical coefficients) of the four components of the original w. 
This means that ~* is connected with w by an equation of the type 


v=, (15) 


where ¥ is an operator like the a’s, which can be represented as a matrix with four 
rows and columns. The conjugate imaginary equation to (15) is 


o = 97, (16) 
Equations (14) will go over into the equations 


{ayn (py + (e/c) Az) ammc} wy" = 0 
and b* {a (ps + (e/c) AS) anme} y = 0. 


246 XII. RELATIVISTIC THEORY OF THE ELECTRON 


provided we can choose y such that 
Fay = Wa Gay; VOm'Y = Am: (18) 


These equations (17) are of the same form as (11) and (12), as required, 
since one can divide out by the extra factors 7 and ¥. 

In order to verify that we can always choose y to satisfy equations (18), 
let us first take the special case when the change of our frame of reference 
consists simply of a rotation through a hyperbolic angle @ in the xt plane, so that 
the transformation equations for the components of a 4-vector are of the type 


Po = po cosh 6 + p;, sinh 8, 
Px = po sinh 6 + p> cosh é@, (19) 
Py=Py Pz = Pe 

The values of the a,, may be written down at once from a comparison of 


these equations with (13). With these values for the a,,, it is easy to see that 
equations (18) hold when we take 


ny = ether a, (20) 


We have, in fact, 

= ee eee ee 

YoY = VV = € 
= 1+ 0a, + 6702/2! + Par /3!+---. 


On account of a? = 1, this reduces to 


Fooy = {1 + 7/214 -o} Oz {0 + 69/31 +--+} 
= coshé + a, sinh é 
= ao coshé + a; sinh @. 


Again, V¥azY = AVY = ao sinh é + a, cosh é. 


Oax C7 20% 


— loa loa 1 = 
Further, Oy = er ayes =e? Ay = Ay, 


since a, anticommutes with a,, which results in a,f(a,z) = f(—ar)a, for any 
function f(a,) of a,. Similarly, 


YOzy = Az, Yam) = Am 


Thus the five equations (18) hold with y given by (20) when the a,, are given 
by (19). 


70. Invariance under a Lorentz Transformation 247 


As asecond typical change of the frame of reference, we may consider a rotation 
through an angle @ in ordinary space about the z-axis. The transformation 
equations are now 

Po= Po, = Px = Pa» 
Py = p, cos@ + p; sin 8, 
Pz = —p, sin 0 + pz cos. 


With the new values for the a,,, we can easily verify that equations (18) hold with 


Hay az _ 
9 


7 = C2002 Oy = e20ey ee 


the analysis being very similar to the preceding case. 

If two changes of the frame of reference are made consecutively, we simply have 
to multiply the corresponding y’s to get the y for the resultant change. Now any 
change of the frame of reference may be built up from two rotations of the types 
we have considered, and hence there will always be a ¥ satisfying (18). 

In this way we see that the solutions of the wave equations in the new frame of 
reference, equations (17), can be put into a natural one-one correspondence with 
those of the original wave equations (11) and (12), corresponding solutions being 
connected by (15) and (16), and we may assume that corresponding solutions 
represent the same state. It remains for us to verify that the electric density 
transforms like the time component of a 4-vector and that the divergence of 
this 4-vector vanishes. 

We shall introduce the notation ¢,.~, to denote the sum of the product of 
each of the four components of ¢, with the corresponding component of w,. 
In the same way ¢€.nwW, where € and 7 are any linear operators that can operate 
on the wave functions, will denote the sum of the product of each component 
of @w with the corresponding component of 7. Our new symbols of the type @&.nw 
are functions of x, y, z and t, and are quite distinct from the products ¢&nw of 
Chapter II, which are just numbers. It should be noted that 


d.ab = daw (21) 


when a is one of the a’s in the wave equation, or more generally when it is any 
operator which means simply taking four linear functions (whose coefficients are 
numbers or functions of x, y, z and t) of the four components of the wave function. 

We can now express the electric density as ¢.W~, which is the same 
as d.agw or dao.W since Ao = 1. Let us see how the four quantities ¢.a,,7), 
with w~=0, 2, y, z, transform under a Lorentz transformation. We have, 
from (15), (16) and (18), 


Pay = PY. IY = Q.Yavyy = P.CpAwwY = (P.ayW) aur 


248 XII. RELATIVISTIC THEORY OF THE ELECTRON 


Comparing this result with (13), we see that the four quantities ¢.a,,~ transform 
like the covariant components of a 4-vector. The contravariant components will be 


oY, —P.Az%), —P.dy), —.a,W. (22) 


This verifies that our electric density ¢.~ is the time component of a 4-vector and 
that the corresponding space components are ¢.a@, (with r = x, y, z). These space 
components multiplied by the factor c give the electric current, or, stated more 
accurately, the probability of the electron crossing unit area per unit time. 

The divergence of our 4-vector is 


Yt, -(6.041) (23) 


where ap denotes ct and the + sign means that the+sign is to be taken for w = 0 
and the — sign for = x, y, z before one does the summation. To prove this 
divergence vanishes, multiply equation (11) by ¢ and (12) by y, taking the sum 
over the four components in each case, and subtract. The result is 


P-OpPpW — Pp Py-W = 9, 


the other terms cancelling on account of (21). With the help of (1) and (2) 


this gives 


which just expresses the mies of o In this way we complete the proof that 
our theory gives consistent results in whichever frame of reference it is applied. 


71. The Motion of a Free Electron 


It is of interest to consider the motion of a free electron in the above theory 
according to the Heisenberg picture and to study the Heisenberg equations of 
motion. These equations of motion can be integrated exactly, as was first done by 
Erwin Schrédinger.* 

As Hamiltonian we must take the expression which we get as equal to W when 
we put the operator on ~ in (8) equal to zero, i.e. 


H = —cpi(o, p) — psmc* = —c(a, p) — psme*. (24) 


tErwin Schrédinger, ,,Uber die  kraftefreie Bewegung in der relativistischen 
Quantenmechanik“ Sonderausgabe aus den Sitzungsberichten der preupischen Akademie 
der Wissenschaften. Physikalisch-mathematische Klasse, (1930), XXIV 418-428 


71. The Motion of a Free Electron 249 


We see at once that the momentum commutes with H and is thus a constant of 
the motion. Further, the z-component of the velocity is 


p= || = Sele. (25) 


This result is rather surprising, as it means an altogether different relation between 
velocity and momentum from what one has in classical mechanics. It is connected, 
however, with the expressions (22) for the charge density and current. The « 
given by (25) has as eigenvalues +c, corresponding to the eigenvalues +1 of a,. 
As y and 2 are similar, we can conclude that a measurement of a component of 
the velocity of a free electron is certain to lead to the result +c. This conclusion is 
easily seen to hold also when there is a field present. 

Since electrons are observed in practice to have velocities considerably less than 
that of light, it would seem that we have here a contradiction with experiment. 
The contradiction is not real, though, since the theoretical velocity in the above 
conclusion is the velocity at one instant of time while observed velocities are always 
average velocities through appreciable time intervals. We shall find upon further 
examination of the equations of motion that the velocity is not at all constant, 
but oscillates rapidly about a mean value which agrees with the observed value. 

It may easily be verified that a measurement of a component of the velocity 
must least to the result +c in a relativistic theory, simply from an elementary 
application of the principle of uncertainty of §28. ‘To measure the velocity we must 
measure the position at two slightly different times and then divide the change of 
position by the time interval. (It will not do to measure the momentum and apply 
a formula, as the ordinary connexion between velocity and momentum is not valid.) 
In order that our measured velocity may approximate to the instantaneous velocity, 
the time interval between the two measurements of position must be very short 
and hence these measurements must be very accurate. The great accuracy with 
which the position of the electron is known during the time interval must give rise, 
according to the principle of uncertainty, to an almost complete indeterminacy in 
its momentum. This means that almost all values of the momentum are equally 
probable, so that the momentum is almost certain to be infinite. An infinite value 
for a component of momentum corresponds to the value +c for the corresponding 
component of velocity. 

Let us now examine how the velocity of the electron varies with time. We have 


Raz = Q,H == Ha, 
Now since a, anticommutes with all the terms in H except —cazpz, 


OH + Hay = —AzCAzPz — CAzPeAy = —2CPz, 


250 XII. RELATIVISTIC THEORY OF THE ELECTRON 


and hence _ i 
| 09) 
Since H and p, are constants, it follows from the first of equations (26) that 
thd, = 2d,H. (27) 


This differential equation in a, can be integrated immediately, the result being 
a Ale 2Ht/h (28) 


where d° is a constant, equal to the value of @, when t = 0. The factor e~?##/h 
must be put to the right of the factor a° in (28) on account of the H occurring to 
the right of the a, in (27). The second of equations (26) leads in the same way to 
the result 

Ge e%Ht/h G0 
We can now easily complete the integration of the equation of motion for 2. 
From (28) and the first of equations (26) 


Oy = Fae H/F 1 — op HO} (29) 
and hence the time-integral of equation (25) is 
x = fch?a®e MH H-? 4 ep Ht + ag, (30) 


a, being a constant. 

From (29) we see that the x component of velocity, —ca,,, consists of two parts, 
a constant part c?p,H~', connected with the momentum by the classical relativistic 
formula, and an oscillatory part 

—hichaQe 2Ht/h Fy-1 

whose frequency is high, being 2H/h, which is at least 2mc?/h. Only the constant 
part would be observed in a practical measurement of velocity, such a measurement 
giving the average velocity through a time-interval much larger than h/2mc? 
The oscillatory part secures that the instantaneous value of « shall have 
the eigenvalues +c. The oscillatory part of x is small, being, according to (30), 


Leh? Qe MHt/R A? — _lich(ag + cppH)H 


which is of the order of magnitude h/mc, since (az + cprH~') is of the order of 
magnitude unity. 


251 


72. Existence of the Spin 


In §69 we saw that the correct wave equation for the electron in the absence 
of an electromagnetic field, namely equation (5) or (8), is equivalent to 
the wave equation (4) which is suggested from analogy with the classical theory. 
This equivalence no longer holds when there is a field. By treating the correct 
wave equation for this case, namely (9), in the same way as we treated (5) and 
comparing it with the wave equation to be expected from analogy with the classical 


theory, namely 
WwW 2 2 
(= + “Aa — (p + “A — mel w = 0, (31) 
@ -€ 


in which the operator is just the classical relativistic Hamiltonian, we may expect 
to get an indication of the new physical features of the present theory. 

We must multiply (9) by some factor on the left to make it resemble (31) as 
closely as possible. Taking this factor to be 


W e e€ 
== —Ao — PI (o, p - “A) — P3MCc, 
cC C c 


( A: © 40) - (« pt ea) - mc 
+1 (= ! = Ao} (« p+ <A) = (« p4 <A) (= ! <a0)|}o=o (32) 


We now use the general formula that, if B and C are any two vectors that commute 
with o, 


o, B)(o, C) = CBAC, + O70, ByCy + Gyo, By Cr 
y y 1 OyOzDy 


= (B,C) +i) > ¢,(B.C, — ByC,) 
= (B,C) +i(o, Bx C). (33) 


Taking B= C = p+ (e/c)A, we find, since 


(p+=A) % (p+=A) _ — {px A+Axp} 
= —ih(e/c)curl A = —ih(e/o) I, 


252 XII. RELATIVISTIC THEORY OF THE ELECTRON 
where # is the magnetic field, that 
e,\2 e,\2 
(<, p+ <A) = (p 35 <A) + —(o, #). 
C C C 
Also we have 


Leth) eer A) eet Ct) 
Cc Cc Cc Cc Cc Cc 


zie (« Le ee ae + Aap ~ Po] 
Cc Cc Cc 
_ the (« ae grad Aa) as eh), 
c Ot Cc 


Cc 


where @ is the electric field. Thus (32) becomes 
2 2 
(2 + “Aa - (p - “A 2490 = Nee KH) — en «| Ua, 
Gs C c c 


This equation differs from (31) through having two extra terms in the operator. 
The electron according to the present theory is more closely analogous to a classical 
system with the Hamiltonian function. 


2 2 
(= + “Ao — (p + “A) —m? — a KH) ip é) 
Cn Me c c c 


If we neglect relativistic corrections, so that we can put W = mc? + W, and count 
W, as small, this Hamiltonian reduces, after division throughout by 2m, to 


€ he 


1 2 _ he 
Ws — | ~eA4 5 (p | A) t 5 —(0, #) + ins —(o, a}. 


We can now see that the two extra terms may be considered approximately as due 
to the electron possessing an additional potential energy of amount 


es 


_ he 
Sana KH ) tips lo; é) 


which may be interpreted as arising from the electron having a magnetic 
moment —(he/2mc)o and an electric moment —p,(he/2mc)o. This magnetic 
moment is in agreement with the assumption of 843 and is what is required by 
experiment. The electric moment, on the other hand, is an’ imaginary quantity 


t[‘pure’ omitted] 


72. Existence of the Spin 253 


and thus cannot be considered as having a physical meaning. The Hamiltonian of 
our original wave equation (9) is real, and the imaginary term has appeared only on 
account of our having performed a rather artificial operation to get a Hamiltonian 
that can be compared with the classical one. 

The spin angular momentum does not give rise to any potential energy and 
therefore does not appear in the result of the preceding calculation. The simplest 
way of showing the existence of the spin angular momentum is to take the case 
of the motion of a free electron or an electron in a central field of force 
and determine the angular momentum integrals. This means working with 
the Hamiltonian (24), or with this Hamiltonian generalized by the addition to 
it of a potential energy —eAp which may be any function of the radius 7, thus 


H =—eAg(r) — cpilo, p) — psme (34) 


With either Hamiltonian we find for the rate of change of the x-component of 
orbital angular momentum, m, = yp, — zpy, With the help of commutability 
relations proved in 8838 and 40, 


= —cpi{m,(o, Pp) 75 (9, p)m;} 

= —cpi(o, Mzp — pmz) 

= —ihep {yD a OzPy}- 
Thus m, 4 0 and the orbital angular momentum is not a constant of the motion. 
This result is to be expected from the integrated equation of motion (30), 
the oscillatory part of the motion here displayed giving rise to an oscillatory term 


in the angular momentum. 
As a further equation of motion with the Hamiltonian (24) or (34), we have 


22 los DE Gaal 
= —Cp| (Oro — 00;z, Pp) 
= —2icpi{OzPy _ Oyp=} 


with the help of equations (55) of §19. Hence 
ih(tn» + 4ho,) = 0, 


so that the vector m+4ho is a constant of the motion. This result one can interpret 
by saying the electron has a spin angular momentum 4ho, which must be added 
to the orbital angular momentum m before one gets a constant of the motion. 


254 XII. RELATIVISTIC THEORY OF THE ELECTRON 


The same vector o fixes the directions of both the spin magnetic moment and 
the spin angular momentum. If an electron in a certain state of spin has a spin 
angular momentum of 4h in a particular direction, it will have a magnetic moment 
—eh/2mc in the same direction. 


73. Transition to Polar Variables 


For the further study of the motion of an electron in a central field of force 
with the Hamiltonian (34), it is convenient to make a transformation to polar 
coordinates, as was done in §40 in the non-relativistic case. We can introduce r 
and p, as before, but instead of k, the magnitude of the orbital angular 
momentum m, which is no longer a constant of the motion, we must now use 
the magnitude of the total angular momentum M = m + 4ho. Let us put 


jh? = M+ M2 + Mi +4h (35) 


The eigenvalues of m, are integral multiples of h, those of $ho are +4h, and hence 
those of M, must be half-odd integral multiples of h. It follows from the theory 
of §39 that the eigenvalues of 7 must be integers. 

If in formula (33) we take B= C =m, we get 


(o, m)? = m? +i(o, m x m) 
m?” — h(o, m) 
=(m + $ho)’ — 2h(o, m) — 2h’ 
Hence {(o, m) + h}? = M? + Fh? 


Thus (o, m) +/ is a quantity whose square is M? + 4h? and we could, consistently 
with equation (35), define jh as (0, m)+h. This would not be the most convenient 
definition for 7, however, since we would like to have 7 a constant of the motion 
and (o, m) + f is not constant. We have, in fact, from applications of (33), 


(0, m)(o, p) = i(o, m x p) 
and (o, p)(o, m) =i(o, p x m), 


so that 


(9, m)(o, P) “3 (9, P) (9, m) = is” On{MyDz — MzPy + PyMz — pin} 


Lyz 


=i) o,(2ih)p, = —2h(o, p), 


ryz 


or {(o, m) + h}(o, p) + (9, p){(o, m) +h} = 0. 


73. Transition to Polar Variables 255 


Thus (o, m) + A anticommutes with one of the terms in the expression (34) 
for H, namely the term —cp;(o, p), and commutes with the other two. It follows 
that p3{(o, m) + h} commutes with all the three terms in H and is a constant of 
the motion. But the square of p3{(o, m) + h} is also M? + 4h? We can therefore 
take 

jh = ps{(o, m) + h}. (36) 
which gives us a convenient rational definition for 7 which is consistent with (35) 
and makes 7 a constant of the motion. The eigenvalues of this 7 are all positive 
and negative integers, excluding zero. 

By a further application of (33), we get 


(0, x)(o, p) = (x, p) + #(o, m) 
= rp, + ipsgh, (37) 


with the help of (36) and also of equation (13) of Chapter VII. We introduce 
the dynamical variable € defined by 


re = pi(o, x). (38) 


Since r commutes with p; and with (o, x), it must commute with «. We thus have 
re = [pi(o, x) =(6,x =xX* =r 


or e=1. 


Since there is symmetry between x and p so far as angular momentum is 
concerned, pi (0, x), like p1(o, p), must commute with M and j. Hence « commutes 
with M and 7. Further, « must commute with p,, since we have 


(0, x)(x,P) — (x, P)(@ x) = (@ x(x, P) — (x, P)x) = th(a, x), 


which gives re(rp, + ih) — (rp, + ih)re = ihre 
or re(p,r + 2ih) — (rp, + ih)re = thre, 
which reduces to EPpy — pre = 0. 


From (37) and (38) we obtain 


repi(o, P) = Tpr + tpsjh 
or plo, P) = ep, + tepsjh/r. 


Thus (34) becomes 


H/c = —(e/c)Ao — €py — i€psjh/r — pgme. 


256 XII. RELATIVISTIC THEORY OF THE ELECTRON 


This gives our Hamiltonian expressed in terms of polar variables. It should be 
noticed that € and pz; commute with all the other variables occurring in H and 
anticommute with one another. This means that we can take a representation in 
which € and p3 are represented respectively by the matrices 


Co eC 9) 


and in which 7, say, is diagonal, and the representative (r|) of a state will then 
have two components, (r|), and (r|), say, referring to the two rows and columns 
of the matrices (39). 


74. The Fine-Structure of the Energy-Levels of 
Hydrogen 

We shall now take the case of the hydrogen atom, for which Ap = e/r, and work out 
its energy-levels, given by the eigenvalues H’ of H. The equation (H’ — H)w = 0 
which defines these eigenvalues, when written in terms of representatives in 


the representation discussed above with € and p3 represented by the matrices (39), 
gives the equations 


(2 +S) (re =rZei- Fein + metre =o, 
(+5) edema - Zea = metro =0 
If we put h h = (40) 


me+ He" mce— H'/co 


these equations reduce to 


2) (ri=0 
(-=+ =) (rl)o + (5-2) (rl)a =0 * 


where a = e?/hc, which is a small number. We shall solve these equations by 
a similar method to that used for equation (20) in §41. 


74. The Fine-Structure of the Energy-Levels of Hydrogen 257 


Put (r\)o =e/f, (rl), = e/g, 
introducing two new functions, f and g, of r, where 
@ = (aa)? = h(m?c? — H?/c?)-4 (42) 


Equations (41) become 


We now try for a solution in which f and g are in the form of power series, 
f = S- cr g= S- oN (44) 


in which consecutive values of s differ by unity though these values need not 
be integers. Substituting these expressions for f and g in (43) and picking out 
coefficients of r*~', we obtain 

Cs-1/a1 tac, — (s+ ji), +c,_,/a=0 (45) 
—c_,/a, + ac, + (8s — j)cs — ¢s-1/a = 0. 


By multiplying the first of these equations by a and the second by ag and adding, 
we can eliminate both c,_; and c,_,, since from (42) a/a, = a2/a. This gives 


[aa + ao(s — j)|cs + [aza — a(s + J)]C, = 0, (46) 


a relation which shows the connexion between the primed and unprimed c’s. 

The boundary condition at r = 0 requires that the series (44) shall terminate 
on the side of small s. If so is the minimum value of s for which c, and c, do not 
both vanish, we obtain from (45), by putting s = so and c,,-1 = c,,_1 = 9, 


QCsy — (so 9 )Gee = 0, 
ac’, + (80 — J)€s) = 9, 


which give a? = —s5 + 7”. 


Since the boundary condition requires that the minimum value of s shall be greater 
than zero, we must take 


89 = +/(j? — 0”). 


258 XII. RELATIVISTIC THEORY OF THE ELECTRON 


To investigate the of the series (44) we shall determine the ratio c,/c._1 for 
large s. Equation (46) and the second of equations (45) give approximately, when s 
is large, 


dis¢; = 4c, 
and $65 =f C4 lo: 
Hence Csf Ca = 2) as: 


The series (44) will therefore converge like 


ee) 


or e?"/¢ This result is similar to that obtained in §41 and allows us to infer, 


as in §41, that all values of H’ are permissible for which a is' imaginary, 
i.e., from (42), for which H’ > mc’, but of those values of H’ for which a is real, 
only those are permissible for which the series (44) terminate on the side of large s. 

If the series (44) terminate with the terms c, and c,, so that cs41 = ch,, = 0, 
we obtain from (45) with s+ 1 substituted for s 


c,/a, + ¢,/a = 0, 
—c,/az —cs/a = 0. 


These two equations are equivalent on account of (42). When combined with (46), 
they give 


ay[aa + a2(s = | = alara = a(s +) 


which reduces to 2a1a25 = a(az — a,)a 
or - ied - i H! 
= —_— a= =a X, 
a 2\a, a ch 


with the help of (40). Squaring and using (42), we obtain 


(m7? _ H” /c?) _ a? H”? |e. 


Hence H' _ a 
mc? 82 


t[‘pure’ omitted] 


i 
2 


259 


The s here, which specifies the last term in the series, must be greater than sg by 
some integer not less than zero. Calling this integer n, we have 


s=nt+/(j? — 0°) 


and thus H' - {1 | a? \ 
F {n+ Jj? — 0?)}? 


me 

This formula gives the discrete energy-levels of the hydrogen spectrum and 
was first obtained by Sommerfeld working with Bohr’s orbit theory. There are two 
quantum numbers n and j involved, but owing to a? being very small the energy 
depends almost entirely on n+|7|. Values of n and |j| that give the same n+|j| give 
rise to a set of energy-levels lying very close to one another, and to the energy-level 
given by the non-relativistic formula (27) of §41 with s = n + |j]. 

For a general value of n, j can have any integral value except zero. 
The value n = 0 is, however, exceptional as it makes equation (46) vanish 
identically. A closer investigation shows that in this case only negative values 
for j are allowed.’ 


(47) 


75. Theory of the Positron 


It has been mentioned in §69 that the wave equation for the electron admits 
of twice as many solutions as it ought to, half of them referring to states with 
negative values for the kinetic energy W + eAp. This difficulty was introduced 
as soon as we passed from equation (3) to equation (4) and is inherent in any 
relativistic theory. It occurs also in classical relativistic theory, but is not then 
serious since, owing to the continuity in the variation of all classical dynamical 
variables, if the kinetic energy W +e Ap is initially positive (when it must be greater 
than or equal to mc”), it cannot subsequently be negative (when it would have 
to be less than or equal to —mc?). In the quantum theory, however, discontinuous 
transitions may take place, so that if the electron is initially in a state of positive 
kinetic energy it may make a transition to a state of negative kinetic energy. It is 
therefore no longer permissible simply to ignore the negative-energy states, as one 
can do in the classical theory. 
Let us examine the negative-energy solutions of the equation 


(= + “49 +a, (p.+5A,) +a, @ ! = Ay) bo. (0. ! “A.) ! ame} p=0 


(48) 


8See Walter Gordon (1928) ,,Die Energieniveaus des Wasserstoffatoms nach der Diracschen 
Quantentheorie des Elektrons“ Zeitschrift fiir Physik 48(1-2), 11-14. [| doi:10.1007/bf01351570 | 


Sommerfeld’s 
formula 


positron 


260 XII. RELATIVISTIC THEORY OF THE ELECTRON 


a little more closely. For this purpose it is convenient to use a representation of 
the a’s in which all the elements of the matrices representing a, a, and a, are real 
and all those of the matrix representing a,, are! imaginary. Such a representation 
may be obtained, for instance, from that of 869 by interchanging the expressions 
for a, and a,, in (7). With such a representation, if we write —7 for 7 in the operator 
of equation (48), we get, remembering (1) and (2), 


Ce € € C 

(49) 
Thus the conjugate complex of any wave function that is a solution of (48) is 
a solution of (49). Further, if the solution of (48) belongs to a negative value 
for W + eApo, the conjugate complex solution of (49) will belong to a positive value 
for W — eAp. But equation (49) is just what one would get if one substituted —e 
for e in (48). It follows that the conjugate complex of any solution of (48) belonging 
to a negative value for W + eAp is a solution, belonging to a positive value 
for W — eApo, of the wave equation obtained from (48) by substitution of —e for e, 
and therefore represents an electron of charge +e, instead of —e, moving through 
the given electromagnetic field. Thus the unwanted solutions of (48) are connected 
with the motion of an electron with a charge +e. (It is not possible, of course, 
with an arbitrary electromagnetic field, to separate the solutions of (48) definitely 
into those referring to positive and those referring to negative values for W + eAo, 
as such a separation would imply that transitions from one kind to the other do 
not occur. The preceding discussion is therefore only a rough one, applying to 
the case when such a separation is approximately possible.) 

In this way we are led to infer that the negative-energy solutions of (48) 
refer to the motion of a new kind of particle having the mass of an electron 
and the opposite charge. Such particles have been observed experimentally and 
are called positrons. We cannot, however, simply assert that the negative-energy 
solutions represent positrons, as this would make the dynamical relations all wrong. 
For instance, it is certainly not true that a positron has a negative kinetic 
energy. We must therefore establish the theory of the positrons on a somewhat 
different footing. We assume that nearly all the negative-energy states are occupied, 
with one electron in each state in accordance with the exclusion principle of Pauli. 
An unoccupied negative-energy state will now appear as something with a positive 
energy, since to make it disappear, i.e. to fill it up, we should have to add to it 
an electron with negative energy. We assume that these unoccupied negative-energy 
states are the positrons. 

These assumptions require there to be a distribution of electrons of infinite 
density everywhere in the world. A perfect vacuum is a region where all the states 


4|‘pure’ omitted] 


75. Theory of the Positron 261 


of positive energy are unoccupied and all those of negative energy are occupied. 
In a perfect vacuum Maxwell’s equation 


div€é =0 


must, of course, be valid. This means that the infinite distribution 
of negative-energy electrons does not contribute to the electric field. 
Only departures from the distribution in a vacuum will contribute to the electric 
density p in Maxwell’s equation 


div & = Arp. 


Thus there will be a contribution —e for each occupied state of positive energy 
and a contribution +e for each unoccupied state of negative energy. 

The exclusion principle will operate to prevent a positive-energy electron 
ordinarily from making transitions to states of negative energy. It will still 
be possible, however, for such an electron to drop into an unoccupied state 
of negative energy. In this case we should have an electron and positron 
disappearing simultaneously, their energy being emitted in the form of radiation. 
The converse process would consist in the creation of an electron and a positron 
from electromagnetic radiation. 

The theory of the positron here given appears at first sight to treat the electrons 
and positrons on very different footings, but actually the fundamental ideas of 
the theory are symmetrical between the electrons and positrons. We should 
have an equivalent theory if we supposed the positrons to be the basic particles, 
described by wave equations of the form (9) with —e for e, and then supposed 
that nearly all the states of negative energy for the positron are filled up, a hole in 
the distribution of negative-energy positrons being then interpreted as an ordinary 
electron. The theory could be developed consistently with the hypothesis that all 
the laws of physics are symmetrical between positive and negative electric charge. 


XH. FIELD THEORY 


76. Quantum Conditions for the Electromagnetic Field 


THE methods of classical mechanics can be applied, not only to particles, but to 
the vibrations of a field such as the electromagnetic field. One can introduce 
dynamical variables to describe the field and can set up a Hamiltonian function 
which enables the equations of motion to be expressed in the Hamiltonian form. 
There exists a corresponding quantum mechanics of fields. It is of interest chiefly 
because of the mathematical beauty of its formal analogy with the classical theory 
when it is expressed in symbolic form. It has not so far led to any practical results 
which could not be obtained by more elementary methods. 

We shall here deal with the quantum theory of the electromagnetic field. 
The foundations of this theory have already been given in Chapter XI, 
where we resolved the field into plane waves and treated the amplitudes and phases 
of these waves as dynamical variables. The present theory will go beyond that 
of Chapter XI in that the field quantities themselves will be used as dynamical 
variables, not merely the amplitudes and phases of their Fourier components, 
and the whole of the mutual interaction between electrified particles, including 
also the Coulomb interaction, will be shown to follow from the interaction between 
the particles and the field. The present theory will be relativistic throughout and 
we shall take the velocity of light c equal to unity 

We shall work for the present with the Heisenberg picture of §32, in order that 
we may have our dynamical variables satisfying equations of motion analogous 
to those of classical mechanics. When we use the field quantities as dynamical 
variables, the first problem that presents itself is to obtain their quantum 
conditions, a problem which was first solved by Jordan and Pauli! The general 
solution of this problem would require us to obtain the commutability relation 
connecting any two field quantities at any two points of space-time x’, t’ 
and x”, t” For building up a dynamical theory, though, it is sufficient to obtain 
the commutability relations connecting all the field quantities at one instant 
of time, corresponding to the fact that in our particle dynamics we had to obtain 


tPascual Jordan and Wolfgang Pauli, ,Zur Quantenelektrodynamik ladungsfreier 
Felder.“ Zeitschrift fiir Physik 47, 151-173 (1928). [ https://doi-org/10.1007/BF02055793 | 


262 


76. Quantum Conditions for the Electromagnetic Field 263 


only those quantum conditions connecting the dynamical variables at one time. 
The more general commutability relations would then be determined by these, 
together with the equations of motion. From general grounds we should expect 
two field quantities at two points in space-time, neither of which lies inside or on 
the light-cone from the other, to commute, since one can measure either of these 
quantities without disturbing the other, on account of the velocity of propagation 
of the disturbance being limited by the requirements of relativity to the velocity of 
light. When we work out the P.B.’s connecting the field quantities at various points 
of space at the same time, we shall find, in agreement with the above expectation, 
that two field quantities commute unless they are at two points infinitely close to 
one another. 

We may assume that the commutability relations connecting the field quantities 
at one instant of time are independent of whether there are charged particles 
present or not, since in our theory of particle dynamics we had the same quantum 
conditions for a system whether that system interacts with another system or not. 
Thus we may work with the case of no charged particles present, when the whole 
of the electromagnetic field can be resolved into plane waves, and use the form 
for @ and # given by (36) of Chapter XI, namely, with t = 0, 


E = yf % cos|Ya — (Ka, x)] dka, - 
KH = = ae cos|Ya — (Ka, x)| dka. 


It is convenient to pass from the continuous range of values of k, to a discrete set, 
corresponding to the discrete set of photon states that we had in 864, and so 
to replace the integrals in (1) by sums. We then get, with s, having the same 
meaning as in 864, 


E = de cos|Ya — (Ka, x)]89 4 
KH = DH cost — (ky, x)]s7 


The length of the vector &, or 4, is given by (38) and (32) of Chapter XI to be 


IE, | = |92,| = 1 (hvanasa)? (2) 


264 XIII. FIELD THEORY 


Thus if we let a, and 8, denote vectors of unit length in the directions of &, 
and #, respectively, we have 


E=n' So (hva)2aan? cos|Ya — (Ka, x)|s, 
: tick ‘ (3) 
FH = 7" So (hva)® Gant cos|Ya — (Ka, x)|s;?. 


a 


These expressions for & and # hold in the classical theory. To pass over to 
the quantum theory, we must consider the n’s and y’s in (3) to be non-commuting 
variables satisfying quantum conditions like (10) of Chapter XI with y instead of w. 
The expressions on the right-hand side of (3) are then no longer real. To make 
them real, we have to replace 2n? cos|Ya — (Ka, x)] by 


niet ei(Kax) aa 7 eh et (Kax) 
Oe Naa ee 
or by ele n2e—t{Kax) + 73 e—H et(Kax) 


We choose the first of these alternatives since the second, in a representation 
in which the n’s are diagonal, would make all matrix elements of @ and #, 
lying in a row or a column for which an n/, = 0, vanish and thus 0 would cease 
to be an effective eigenvalue of the n’s. The second alternative would not give 
a different physical theory, but would merely mean working with variables n which 
are greater by one than the numbers of photons in the various states. Equations (3) 
now become 


a? 


& = (2n)" So (hva)2aa{ndel eka *) 4 Mand eilka x)) o-4 


= (2m) D Mla hora {Exe + Exel } 54, (4) 


a 


with the introduction of variables €, and €, like those of equations (13), Chapter XI, 
and similarly, 


Hl = (2m) * D (hva)* Ba {Ege + Ene} 54 (5) 


a 


Let 6 be the component of & in a certain direction / and G;,, the component of & 
in another direction m, which may, as a special case, be the same as the direction J. 
We via now work out the P.B. COHRCCHnE & at the pon a ,Yy', 2’, which we write 
as EO , with &,, at the point x”, x”, 2, which we write EO. Since @ s and €’s with 
different suffixes always commute with each other, we apiain from (4) 


76. Quantum Conditions for the Electromagnetic Field 265 


ia EO = (20)? Sc hvadtamalege ag eh Ka), cen 4 Fete ee 


m 
a 


= (20)~? S “hvaCtaOmalih) {ee x!—x") en ilk xi—x!") sat 


from the quantum condition ae 
CanGe 4 aa = de (6) 
which comes from (15) of Chapter XI. This reduces to 


(6°, 62) = x7 >, VaQaQma Sin(Ka, x’ — x”) 7 (7) 
When we do the summation over all values of a here, each value of a meaning 
a direction of motion and frequency of a Fourier component of the field, 
together with a direction of polarization, we shall evidently get the result zero, 
since each term in the sum will just cancel with the term corresponding to 
the opposite direction of motion, the same frequency and the same direction 
of the electric vector. Hence the electric field quantities at different places and 
the same time all commute with each other. Similarly, working from (5), it may 
be shown that the magnetic field quantities at different places and the same time 
all commute with each other. 

It remains for us to determine the P.B. connecting the electric field at one 
place with the magnetic field at another. By similar work to that which led to (7) 
we obtain 

[g, FO) =—q! S- VeQiaPma Sin(ky, x’ — x”) s7' (8) 


m (rn 


Let us first do the summation here with respect to both states of polarization for 
a given direction of motion and frequency. The state of polarization is specified 
by the two mutually perpendicular vectors @ and G, giving the directions of 
the electric and magnetic fields, and to pass from one state of polarization to 
the other we merely have to replace @ by @ and 6 by —a@. Thus to sum over both 
states of polarization in (8), we have to replace Qjag3ma by 


OGe Pani ~~ Digan: (9) 


If the two directions 1 and m are the same, expression (9) vanishes and thus 
the right-hand side of (8) vanishes. Hence components of € and in parallel 
directions always commute. 

We shall now take the case when the directions | and m are perpendicular 
and suppose them, for definiteness, to be the x and y directions. Expression (9) 
then becomes 

Wasaga a rai 


266 XIII. FIELD THEORY 


which is just the z-component of the vector product a, x G,. Since this vector 
product is of unit length and is in the direction of the vector k,, its z-component is 
the cosine of the angle between the vector k, and the z-axis and is thus k,,/2714. 
Hence equation (8) becomes 


a 


[eO, HOO) =tr”? S- kzqsin(Ka, x’ — x”) 5". 
ka 
Passing from the sum back to an integral, we get 


(EO, ZO) = ae? fk, sin(k,, x’ — x”) dk 


xr y 


4) 
= ea [ costi.x! — x") dk 


0 
= An {5(a" = 2 bly = y")d(z = 2")} 
z 
with the help of formula (15) of Chapter IV. Thus we obtain finally 


[E ) He ) = —4n5(c! >) x" )5(y! _ y)8'(2! = 2") 
and similarly, (10) 


(60, EO] = Ar d(a! — 2"8'(y! — y"D0(e! — 2") 


x z 


with corresponding relations for é and &,. This gives us all the quantum 

conditions for the field quantities at a definite time. Two of these field quantities 

always commute if they are at two points in space which are not infinitely close. 
The total energy of the field in the absence of any charged particles 


Ap = So ahve = Dana (11) 
from (14) of Chapter XI. From (4) 


6? dx = (20) 7h S- vAv} (Ca, oy) | {Ege Ke” + Eqelko) 
a’b a 
a,b 


= Ih S~ vAv} (Ora, ov) { (Ens + abs) (Ka + ke) + (Ener + a8s)5(Ka — ky) $5545) 4 


a,b 


(12) 


76. Quantum Conditions for the Electromagnetic Field 267 


the 5 function of a vector having the same meaning as in equation (19) of 
Chapter IX. Similarly, 


[22 ax = 20h AAG, B) Eck, + S080)5 (ka + I) 
a,b 


+ (Eqés + €a€y)5(Ka — ky) }5735; 4 


Hence 


/ (8? + 30?) ax = Ih S~ AVAL (eva, 0) + (Bar By)} Eako-+ Eaky)5 (ka ke) 5,455 2 


a,b 


the remaining terms cancelling since (@,,Q@)) = —(G,,,) when the vectors k, 
and k, are in opposite directions. This result reduces to 


(6? + 90?) dx = 40h rn Gabo + baba) 


since, as is easily verified, d(k, — k,)s,4s, occurring in a sum over a or b is 
equivalent to dx,k,- With the help of (6) we now find 


Je + 4°) dx = 8h” ValE,fa +4), 
and hence the energy (11) becomes 
Hr = (1/87) [e@ + #°) dx — > Avg. (13) 


Thus the classical expression for the energy of the electromagnetic field, 
given by (37) of Chapter XI, holds in the quantum theory only when another 
term is added to it, consisting of a contribution —4hv for each degree of freedom 
of the field. This extra term is infinite, but it is a constant, independent of all 
the dynamical variables, and may therefore be neglected when (13) is used as 
a Hamiltonian. 

The equations of motion of the field may be deduced directly from the quantum 
conditions for the field quantities and the form (13) for the Hamiltonian, 
without any resolution of the field into Fourier components. Thus, for example, 
we get as the equation of motion for EO 
EO = (60, He] = (1/8m) il EO, BO? + ALO”! dx". (14) 


x 


268 XIII. FIELD THEORY 


Now from (10) 
/ W 2 = / Ww uM uw / uM 
(60, HE ) |= (EO, HEA 1H) + 1} NEO, HE ) 


= 81 0 5(x' = a! )(y" 2 y")d' (2! Me 2"). 


and hence 


fee. ) HO) = ye xv — 2" )d(y' — y")6'(2' — 2") dx" dy” dz” 


= 2 0 
"Oz 


from (4) and (6) of Chapter IV. Similarly, 


[188 2M ax" = 80 ee) 


Thus (14) reduces to 


x ay y I dy! Zz) (15) 


which is one of Maxwell’s equations. 


77. Quantum Conditions for the Electromagnetic 
Potentials 


The foregoing theory of the quantum conditions for the electromagnetic field 
quantities @ and # in a vacuum must now be extended to include the quantum 
conditions for the potentials Ap and A. It might be thought that such an extension 
is not necessary, since only the field quantities & and # are physically significant. 
The potentials appear, however, in the equations describing the interaction of 
a charged particle with the field, so that when we come to take the presence of 
charged particles into account, as we shall do later, we shall need to know all about 
the potentials. 

The problem of including the potentials in our theory is not a straightforward 
one, owing to the potentials not being uniquely determined in terms of the field 
quantities € and #. The arbitrariness in the potentials can be reduced by 
imposing on them the condition that their four-dimensional divergence shall 
vanish, i.e. 

OAo/Ot + div A = 0, (16) 


but even then they are not completely determined. In Chapter XI we made them 
definite by taking Ag = 0, but such an assumption is not relativistic and so cannot 


77. Quantum Conditions for the Electromagnetic Potentials 269 


be made here. For the present we shall ignore equation (16) and the arbitrariness 
in the potentials and shall return to these questions in the next section. 

We express the potentials in terms of their Fourier components, like we did 
the field quantities @ and # in (1), thus 


A, = [A cos|yx + 271%, t — (k,x)| dk, (17) 


where the suffix 4 takes on any of the values 0, x, y, z. It is necessary to do this 
Fourier resolution for all time, and not merely for the time t = 0, as we did in (1), 
owing to the fact that the potentials are not determined throughout all time if 
they are given at one time, as is the case with @ and #. The amplitudes A,, 
in (17) are specified by the suffix k, and not by a suffix a as in (1), since there is 
only one of them corresponding to any value of k, and not two, referring to two 
different states of polarization, as in (1). We again pass from integrals to sums 
and get, corresponding to the forms (4) and (5) for & and &, 


A, = » Ax COS|\_ + 271% t — (k, x)] s 
= = Stas i[271,t—(k, x)] oo Nu per Pant (k, Oh a (18) 


with the variables 7 playing roughly the part of the €’s in (4) and (5). We may 
write this result as 


7 —1 x a x —4 
An = Y{Gne + Gael poyh (19) 
k 


where 
271i t 


Guile > ke ’ uk = per ee (20) 
The ¢’s and ¢’s are dynamical variables not involving the time explicitly, 
since the equations connecting them with the dynamical variables A, do not 
involve the time explicitly. They thus satisfy equations of motion of the form (10) of 
Chapter VI, rather than (13) of Chapter VI. These equations of motion, when there 
are no charged particles present, must be such as to make the ¢’s and ¢’s vary with 
time according to equations (20) with the 7’s and 7’s constants. 
We must now obtain the quantum conditions for the ¢’s and (’s at 
a particular time. Let us study first the C’s and ¢’s describing one particular Fourier 
component of the field, consisting of waves moving in the direction of the z-axis 
with a definite frequency v, so that we have ky = k, = 0, k, = 2m7v. We shall 
then have ¢, and ¢, determined by the field quantities @ and #, According to 
the equations 


OAy 049 pA, _ Ag 
Ot Oz’ 


270 XIII. FIELD THEORY 


¢, and ¢, must each be (27iv)~+ times that Fourier coefficient (27)~!(hv)2&, in (4) 
belonging to ky = k, = 0, k, = 27v, and to the y and z directions for the electric 
vector respectively. Thus from the quantum condition (6) 


CuSy = GySy = h/161r*v, (21) 
6:0, — CG, = h/16n*v. 
Further, ¢, and ¢, commute with ¢, and ¢, since they belong to different degrees 
of freedom. 
The ¢, and G variables are not determined by & and #, so that we cannot 
deduce their quantum conditions in the same way. We have to make some new 
assumptions for them. To make equations (21) into a complete relativistic set, 


we assume 24 
Cube — Cube = h/16n*y, 
CoCo > CoCo a —h/16n*v. 
The minus sign on the right-hand side of the second of these equations is required 
by relativity. Finally we assume generally 


Goxs Gu] = 0, is Gs = 0, ie O =0 forpFuv. (23) 


This gives us the complete set of quantum conditions for the particular Fourier 
component we are considering. Those for the other Fourier components will 
be of similar form. Variables belonging to different Fourier components must, 
of course, commute. 

We can now work out the quantum conditions connecting the potentials 
at. different points in space-time. Denoting by AY (4!) the value of A, at 
the point x’, t’, we have in the first place, from (23), 


(22) 


(AQ), ADU) =0, pv. (24) 


Lb 


Further, by the same kind of work as led to (7) or (8), we obtain 


a HM 


[AQ (¢), AD] = + > (1/403) sin[(k, x! — x”) — 20, (t — t")] 9,1, 


See [ann sin|(k, x’ — x”) —27y,(t'—t”)|dk, — (25) 


77. Quantum Conditions for the Electromagnetic Potentials 271 


the upper sign being taken for u = 2x, y, or z and the lower sign for uw = 0. 
The evaluation of the integral here leads to the result 


HM im 


[AO (t'), AC (t")] = +40 / y sin[27v(|x' — x"|cos 6 — t' + t”)| dv dcos6, 


= +(2/ |x’ — x”|) [P foostanvttx —x"|4+t'-t")] 


— cos[27v(|x’ — x"| — t' + t”)]} dy, 
= +(1/ |x’ _ x""|){5(|x’ _ x" | dt t! _ t") _ (|x -_ x” | _ t! ate aon 
(26) 


from (15) of Chapter IV. The expression (26) vanishes when t/ = t” so that 
the potentials at a given time instant all commute with each other, that is 


[AQ, AM] =0. (27) 
For t/ < t”, the second term in the {} brackets in (26) vanishes, so that we can 
change its sign. We then get, from (12) of Chapter IV, 
[AD (#), AD (E)] = #25{ G0 — x")? — (¢ — 2')}. (28) 


be 


Similarly, for t’ > t”, 


[AD @), ADO] = F26{(x! — x”)? — W — £")?}. (29) 
This gives us all the quantum conditions for the potentials. 

The useful quantum conditions, in connexion with the dynamical equations, 
are those connecting two A’s at different places at the same time, namely (27), 
those connecting an A and 0A/Ot at different places at the same time, all these 
being required since the A’s and the 0A/Ot’s at a given time are independent. 
Those involving 0A/Ot’s may be obtained from (28) and (29), but can be obtained 
more easily from (25), since the 6 function in (28) and (29) is an awkward one 
to differentiate. Differentiating (25) with respect to t’ and putting ¢/ = t” = t, 
we obtain 


(aA /at, AC] = (1/202) / coal Sadi 
= F476(x’ — x"). (30) 


Again, differentiating (25) with respect to t/ and t” and putting t/ = t” = ft, 
we obtain 


dA /at, AAW /At] = F(1/7) | ve sin(k,x’ — x”) dk =0. ah 
yp Ki 


242 XIII. FIELD THEORY 


Let us now suppose the Hamiltonian (11) to be expressed in terms of the ¢ 
variables. The contribution to it of the particular Fourier component which 
we considered above, consisting of waves moving in the direction of the x-axis, 
will be 

16n4v?(CCy + 6.62), (32) 
as may be seen by referring to (11) and to the connexion between the ¢’s and 
the €’s which we had in deriving (21). This contribution to the Hamiltonian 
will make ¢, and ¢, vary with time in the desired way, namely according 
o (20), but it will make ¢, and Co constants of the motion, since they commute 
with (32). It therefore becomes necessary to modify the Hamiltonian and to replace 
the contribution (32) by 


161*V7 (Cae + Cuby + C62 — Coo); (33) 


in order that all four ¢’s may vary with time according to (20). It is better to put 
the ¢) to the left of the Co, as will be seen in the next section, equation (44). 
With (33) in the Hamiltonian, ¢,, ¢, and ¢, may be pictured as describing three 
harmonic oscillators of the ordinary kind, and ¢o a fourth harmonic oscillator of 
negative mass. The total Hamiltonian is now 


Hr = 16n* oS Ve (CorGake + Cin + CnGex — CorCon)- (34) 
k 


The physical effect of the extra terms that have been introduced here will be 
discussed in the next section. 

It may be noted that equations (20), giving the integrals of the equations of 
motion for the ¢’s and ¢’s, must be equivalent to 


ee ay ae ee (35) 


Cuk = € 


from (18) of Chapter VI. 


78. The Supplementary Conditions 


We must now consider what we are to do with the classical equation (16). 
We cannot take it over directly into the quantum theory without getting 
inconsistencies. For example, the P.B. of the left-hand side of (16) with A\? 
does not vanish, according to the quantum conditions (24) and (30), and so 
this left-hand side itself cannot vanish. The way out of the difficulty was 
shown by Fermi! It consists in adopting a less stringent equation than (16), 
namely the equation 
{OAo/Ot + div A}w = 0, (36) 
tEnrico Fermi (1932). “Quantum Theory of Radiation.” Reviews of Modern Physics, 4(1), 
87-132. | doi:10.1103/RevModPhys.4.87 | 


78. The Supplementary Conditions 273 


and assuming it to hold for any w representing a state that can actually occur 
in nature. The operator in (36) involves x and t as parameters, so there is one 
equation (36) for each set of values for x and ¢, and these equations must all hold 
for any w representing a state that can actually occur. [The w in (36) does not 
depend on t, since we are using the Heisenberg picture, in which each state is 
represented by a fixed ~.| 

We shall call a condition, such as (36), which a ~ has to satisfy to represent 
an actual state, a supplementary condition. The existence of supplementary 
conditions in our theory does not mean any departure from or modification in 
the general principles of quantum mechanics. The principle of superposition of 
states and the whole of the general theory of states and observables, as given 
in Chapter II, apply also when there are supplementary conditions, provided 
we impose a further requirement on a linear operator in order that it may 
represent an observable, namely the requirement that, when it operates on 
any wW satisfying the supplementary conditions, it changes this w into another w 
satisfying the supplementary conditions. We have already had an example of 
supplementary conditions in the theory of systems containing several similar 
particles. The condition that only symmetrical wave functions, or only 
antisymmetrical wave functions, represent states that can actually occur in nature, 
is precisely of the same type as condition (36) and is what we are now calling 
a supplementary condition. In this theory the further requirement on linear 
operators in order that they shall represent observables is that they shall be 
symmetrical between the similar particles. 

When we introduce supplementary conditions into our theory we must verify 
that they are not too restrictive to allow any w at all to satisfy them. If we have 
more than one supplementary condition, we can deduce further supplementary 
conditions from them by taking P.B.’s of the operators in them; thus if we have 


Uy = 0, Vy=0. (37) 
we can deduce 


and so on. To verify that our supplementary conditions are not too restrictive, 
we have to look into all the further supplementary conditions obtainable by 
this procedure to see that they can be satisfied, which we can usually do by 
showing that after a certain point the further supplementary conditions are all 
either identically satisfied or repetitions of the previous ones. 

Since the left-hand side of (36) must vanish for all values of x, t, we can 
resolve it into its Fourier components and each component must vanish separately. 


supplementary 
condition 


274 XIII. FIELD THEORY 


From (18) and (20) this gives us equations of the form 


ee 


Gow, oe 


where we have taken as a typical case the Fourier component we had in 
the preceding section referring to waves moving in the direction of the x-axis. 
Equations (39) are equivalent to (36) and are more convenient to work with for 
many purposes. The two operators in the two equations (39) we have written 
down commute with each other on account of (22) and (23). It follows that all 
the operators in all the various equations (39) referring to the various Fourier 
components will commute with each other, since variables belonging to different 
Fourier components commute. Hence our supplementary conditions are not too 
restrictive, all the further conditions obtainable from them in the way (38) was 
obtained from (37) being identically satisfied. The effect of the supplementary 
conditions (39) is to stop the ¢) and ¢, variables from contributing to the number 
of degrees of freedom, so that we are left with only two degrees of freedom for each 
frequency and direction of motion. 

Since equation (16) is not valid and has to be replaced by a supplementary 
condition, any consequences of (16) in the ordinary Maxwell theory will not 
be valid in the quantum theory and will have to be replaced by supplementary 
conditions. The equations 


div? = 0, OF /Ot = —curl € (40) 
follow simply from the equations defining € and #, 
€ = —0A/0t — grad Ao, H =curlA, (41) 


and are therefore valid also in the quantum theory. The other Maxwell equations 
for empty space, however, namely 


div€ = 0, 0€ /ot = curl #, (42) 


can be derived only with the help of (16), and are thus not valid in the quantum 
theory. They must be replaced by 


{div@}W =0, {08 /dt — curl} = 0, (43) 


holding for any representing a state that can actually occur. The failure of 
the second of equations (42) is connected with the change which we must make in 
the Hamiltonian from what it was when we derived (15). 


78. The Supplementary Conditions 275 


The extra terms which we introduced in passing from (32) to (33) are 


167*v7 (C262 — CoCo) = Srv? {(Cy + Co) (Ge — Go) + (Ce + Co) (Ce — Co) } (44) 


from (22) and (23). Thus these extra terms vanish when multiplied into any 
satisfying (39). Hence the total change in the Hamiltonian, i.e. the difference 
between (34) and (11), will vanish when multiplied into any w representing a state 
that can actually occur and will therefore be physically unobservable. The new 
Hamiltonian may be put in the form 


Pere i > (grad Ay, grad Ay) + (04,/0t)2}dx —3 Shue, (45) 


be 


where the + sign is to be taken for up = x, y, z and the — sign for pp = 0, by the same 
kind of analysis as that by which the old one was put in the form (13). This new 
Hamiltonian will lead, of course, to the correct equations of motion for the A,, 
and OA,,/0t, namely 


[A,., He] = 0A,,/t, (46) 
[0A,,/Ot, Hr] = V’A,, (47) 


as may easily be verified from the quantum conditions (24), (27), (30), and (31). 

It should be noticed that the field quantities @ and commute with 
the operators in the supplementary conditions. This follows from the fact that, 
if we take a Fourier component referring to waves moving in the direction of 
the a-axis, from (41) its contributions to &,, &, 4 and #, will depend only 
on Gy, ¢, ¢, and ¢,, its contribution to #% will vanish, and its contribution to & 
will depend only on (¢, —) and (¢,,—Co). Each of these ¢’s or ¢’s or combination 
of ¢’s or ¢’s commutes with the operators in (39). From this commuting of & 
and # with the operators in the supplementary conditions we can infer that, 
when @ and £ is multiplied into a w satisfying the supplementary conditions, 
it will give another w satisfying the supplementary conditions, and hence it fulfils 
the new requirement for being an observable. Further, the field energy Hr is 
composed of terms which are functions of & and # and terms which vanish 
when multiplied into a w satisfying the supplementary conditions. Hence Hp 
multiplied into a w satisfying the supplementary conditions gives another w 
satisfying the supplementary conditions, and thus Hy fulfils the new requirement 
for being an observable. 


276 XIII. FIELD THEORY 


79. Interaction of Field and Particles 


We shall now consider how the presence of charged particles in the field is to be 
taken into account—a problem that was first solved by Heisenberg and Pauli.+ We 
can attack this problem by passing from the Heisenberg picture, which we have 
used exclusively in the three preceding sections, to the Schrédinger picture and 
setting up the Schrédinger wave equation with the Hamiltonian 


H=Hp+)_ H,, (48) 


where H, is the energy of the field alone, given by (34) or (45), and H, is the energy 
of the rth particle in interaction with the field, If we assume the particles are 
described by wave equations of the form of the relativistic wave equation for the 
electron, equation (9) of Chapter XII, we should have for H,,, 


A, = CrAGe =. (a, Pr — e,A,) — Amr™Mr ; (49) 


where e, and m, are the charge and mass of the rth particle, p, and the a,’s 
are dynamical variables describing this particle, and Ao,, A, are the potentials at 
the point where this particle is situated. These potentials are of the form (19), 
where the ¢’s and ¢’s are now (like all dynamical variables in the Schrédinger 
picture) fixed operators, but satisfy the same quantum conditions as before. 

We have as wave equation 


d 
ine = {Hp+ d. Hp. (50) 


This wave equation is not at all relativistic in its form, since it involves only one 
time variable t, but many sets of space variables x, y, z, one set for each particle. 
In order to get it into a relativistic form, it is necessary to introduce several time 
variables t,, tg,..., t,,..., one for each particle. This can best be effected with 
the help of a certain contact transformation, by a method due to Rosenfeld.t We 
make the contact transformation of dynamical variables 


B* — eiHrt/h pense (51) 


tWerner Heisenberg, Wolfgang Pauli (1929). ,,Zur Quantendynamik der Wellenfelder.“ 
Zeitschrift fiir Physik  56(1-2), 1-61  [doi:10.1007/bf01340129 |; Werner Heisenberg, 
Wolfgang Pauli (1930) ,,Zur Quantentheorie der Wellenfelder. IL.“ Zeitschrift fiir Physik 59(3-4), 
168-190 [doi:10.1007/bf01341423 |]. English translations by D H Delphenich can be found 
at http: / /neo-classical-physics.info /uploads/3/0/6/5/3065888/heisenberg_and_pauli_- 
_qed_i.pdf and http: //neo-classical-physics.info /uploads /3/0/6/5/3065888/heisenberg_and_pauli_- 
_ qed _ii.pdf 

tLéon Rosenfeld (1932). ,Uber eine mégliche Fassung des Diracschen Programms zur 
Quantenelektrodynamik und deren formalen Zusammenhang mit der Heisenberg-Paulischen 
Theorie.“ Zeitschrift fiir Physik 76(11-12), 729-734. [doi:10.1007 /bf01341566 | 


79. Interaction of Field and Particles 277 


and put pr = ettrt/h yy, 


We then get, as the wave equation for ~”, 


ial = ¢fMrtih Ga = Hew} 
= etHrt/h So Apt 


S° Ary. (52) 


[This work, it may be noted, is essentially equivalent to that leading to 
equation (14) of Chapter VIII.] The wave equation (52) for ~* differs from the 
previous one (50) for ~ through the disappearance of Hp and the replacement of 
the H,’s by H*’s. 

Let us now examine H;*. From (49) and (51) we obtain 


i eet e Abp — (Q,, Py — €r-A,) — Cp, ve EU 
= er Aor > (a,, Pr — e,A,) — AmrMr, (53) 


since Hp commutes with the p’s and with the a’s. Further, from (19), with x, 
denoting the position of the rth particle, 


- a —4 Xr * a Xr 4 
Ae = S {Cine (een) he Gk Us )} 32 
k 
where Gr = ottetine. erin 


Now ¢,x is a constant operator and is thus like the 7, of the preceding section, 
so that, from (35), ¢, must vary with ¢ in the same way as the ¢, of the preceding 
section did, i.e. according to the law (20). Thus 


Ce= Ge (54) 
so that 


A‘, as Geer er 4 Cpe (ker) gd (55) 
k 


Equations (53) and (55) give us H*. They show that H% is of relativistic form in 
the space-time variables x,, t. We shall later require to use formula (55) applied 
to a general point x, not necessarily one where a particle is situated, when it will 
read 
a a 
Ai -_ SiGe ke Cige Here lhe 2 (56) 
k 


278 XIII. FIELD THEORY 


These A/’s for various values of x and ¢ are similar to, and satisfy the same 
commutability relations as, the A,,’s in the Heisenberg picture with no charged 
particles present. 

We now introduce a wave function VW which involves, not just a single time 
variable t, like ~* does, but a whole set of them t,, one for each particle, and 
suppose it satisfies the following set of wave equations 


Uv 
ine 


Ot,. = i, (t,)W, (57) 
where H;(t,) is the H* given by (53) and (55) with t, substituted for t. There is 
one of these wave equations for each particle. These wave equations are obviously 
of relativistic form and we assume them as the fundamental equations describing 
the relativistic interaction of several charged particles with the field. 

In order to justify the replacement of the wave equation (52) by the wave 
equations (57), we ought to verify, firstly, that when we put all the time variables t, 
in VY equal to t, we get a v* which satisfies (52), and secondly, that every ~* 
satisfying (52) can be generalized to a W satisfying (57). The first of these 
conditions follows at once from the fact that the operator d/dt applied after 
we have put all the t, equal to ¢ is equivalent to 5), 0/Ot, applied before, so that 
the equation obtained by summing (57) over all r goes over into (52) on putting 
all the t, equal to t. To verify the second, we note that we may take V with 
each t, put equal to some given t to be arbitrary, and equations (57) will then 
have a solution provided they are consistent, in the sense that 0/Ot, of OW/Ot, 
given by one of equations (57) equals 0/0t, of OW/Ot, given by another, for 
every pair r, s. The condition for this consistency is easily seen to be that all 
the operators H*(t,) shall commute with each other. We see from (24) and (26) 
that they do commute provided 


(tp —t3)° < (x, —x,)” (58) 


for every pair r, s. Thus we have the conditions (58) putting a restriction on 
the domain of existence of the wave function vy, and inside this domain of existence 
we can obtain a V corresponding to any ~* that satisfies (52). 

The restriction (58) on the domain of existence of WV is to be expected also 
from the physical interpretation of the wave function. The natural interpretation 
to assume for the wave function WV, as a generalization of that for the wave 
function w or w*, is that the square of its modulus for any set of values for 
the x,, ¢, is proportional to the probability of each of the particles being in 
a small volume about the point x, at the time ¢,, with the field in a specified 
state (ie. with specified photons in existence). Such an interpretation would 
not be permissible outside the region (58), because of the interference that there 


79. Interaction of Field and Particles 279 


would then be between the observations of the positions of the various particles at 
the various times. 

The scheme of wave equations (57) is assumed to describe completely 
the interaction between this various charged particles and the field and should 
therefore include the interaction between one charged particle and another, since in 
field theory there is no direct interaction between one particle and another but only 
an indirect one, arising from each particle influencing the field in its neighbourhood 
and this influence spreading out till it reaches the other particles. Thus these 
equations should include forces between the charged particles of a type which 
reduces to the Coulomb forces in nonrelativistic approximation. It is not at all 
evident that the equations do include such forces, since they appear to take into 
account the action of the field on the particles, but not the action of the particles 
on the field. In order to verify that they are complete, we must go back to 
the Heisenberg picture and see that the equations of motion of the field are then 
analogous to the classical equations of motion of the field with the influence of 
the charged particles duly taken into account. 

Going back to the Heisenberg picture requires us to put all the time variables 
equal and to take the Hamiltonian (48). With this Hamiltonian we get as the 
equation of motion for 0A,,/Ot 


@A, | dA, | 
Ot? = AE’ Hp I Da 
A 
=VA, +> eo Ht, (59) 


a Ht, =e, a Aun = 4re,d(x — x;,) 


and similarly 


. 1 —aon . (a, An) = —4Ar7e,a,0(x — x;,) 


Thus (59) becomes 
0? Ao 
Ot? 
OPA 


2 7 V2A = —4r S- CQ O(X — X;). 


— WV? Ao = 4r Ss" €,0(X — X,), 
: (60) 


r 


280 XIII. FIELD THEORY 


These are the equations required by Maxwell’s theory for charges e, at the various 
points x, moving with the velocities —a@,, which are the velocities required by 
the relativistic theory of the electron. Hence the action of the particles on 
the field, giving rise to Coulomb forces between the particles in non-relativistic 
approximation, is correctly taken into account in the Hamiltonian (48), and thus 
also in the scheme of equations (57). 

To complete the theory, we must now obtain the supplementary conditions 
to go with the wave equations (57). The conditions which naturally suggest 
themselves are (39) with W instead of 7. These would be equivalent to (36) with 
the constant operators A, of the Schrodinger picture replaced by Aj,’s and WV 
instead of w. These conditions need amendment, though, as may be seen from 
the following considerations. Equations (57) may be regarded as supplementary 
conditions and have to be consistent with any further supplementary conditions, in 
the way discussed in connexion with equations (37). (This consistency requirement 
is equivalent to the requirement that, if the further supplementary conditions hold 
for some value for each of the t,, they shall hold generally.) Now the operators 
in (57) are built up from 


Pr — €p A, eles W, - Cr Ady (f); (61) 


W,. meaning the energy operator of the rth particle, i.e. ihO/Ot, when operating 
to the right. We have, remembering that our A*’s satisfy the same commutability 
relations as the A*’s in the Heisenberg picture with no charged particles present, 
namely equations (24), (27), (28) and (29), 


OAS ie . = OAS 4s 

ey + div A*, W, — erAiglte) = —e, Se 4i.tto] 
= ~e, 2 AS, Ag, (te)] = F2er 44 (x — x,)? — (t= t,)?} 
os ” OH 02 Aor er) | = OE r us 2 


the minus or plus sign being taken according to whether t > t,, or t < t,. Thus 


OAS 0 

At 2 * < ae i 2 -_, 2 

Ot + div A*, W, erAiglte) Ee} aL, d{(x — x,)° — (¢ —t,)*} 
= ¥2e,[5{(x — x)” — (¢ — t)*}, W, — €-A5, (tr), 


so that 


ce ae 2 2 
ae diva eo) —(t—ts)”} (62) 


commutes with W, — e,Aj,(t,), the plus or minus sign being taken for each term 
in the sum according to whether t > t, or t < ts. Similarly (62) commutes with 


281 


all the other quantities (61). We therefore take as our supplementary conditions 


{oa 4 div At +2 = Ga aS it nee es 


There is one of these conditions for each point x, ¢ in space-time, the x, 
t variables being quite arbitrary and independent of all the x,, t, variables. 
These supplementary conditions are consistent with the wave equations (57), 
since the operator (62) commutes (for all values of x, t) with the operators 
in (57). The additional terms in (63), involving the 6 function, are necessary 
to secure this consistency. These additional terms do not interfere with the mutual 
consistency of the various equations (63) obtained by giving different values to x, 
t, since they commute with all the other operators in these equations. 


80. The Quantization of Electron Waves 


If the charged particles in the preceding section are electrons, we should have 
to impose the additional condition on WV that it shall be antisymmetrical between 
all the electrons. We can then put the equations for V into a more concise and more 
symbolic form, by the use of a procedure of quantization of electron wave functions, 
which was discovered by Jordan and Wigner! This procedure is the analogue for 
particles satisfying the exclusion principle of the second quantization discussed 
in §62 for particles satisfying the Einstein-Bose statistics, and we shall deal with 
it on corresponding lines to those used in §62 for the Einstein-Bose case. 

We begin by describing the states of our assembly of particles by 
antisymmetrical representatives (qiq2---dn|), of the kind we had in 857. 
We introduce the observables n, having the same meaning as in 862, i.e. ny, is 
the number of q’s equal to g, nz the number equal to q®, and so on. Each of 
these n’s now has as eigenvalues only 0 and 1, since for any n having a value greater 
than 1 the antisymmetrical wave function (qiq2.-..qn|) would vanish identically. 
We pass over to the representatives (njn2...|), assuming as the connexion between 
the two representatives of any state 


(ning... |) = £(gige .-. Gnl): (64) 


This equation is the analogue of (2) of Chapter XI. The normalizing 
factor [n!/(ni!nz!nz!...)]? in (2) of Chapter XI is not required in (64) on account 
of the eigenvalues of the n’s now being restricted to 0 and 1. We need, however, 
a + sign in (64), which we did not have in (2) of Chapter XI, since for given 


#Pascual Jordan, Eugene Paul Wigner (1928). ,,Uber das Paulische Aquivalenzverbot“ Zeitschrift 
fiir Physik, 47(9-10), 631-651. [doiz10.1007 /bf01331938 | 


282 XIII. FIELD THEORY 


values of the n’s, the values of the q’s in (qiq2..-Qn|) are fixed but not their order, 
so that, (qig2---Qn|) being antisymmetrical, there will be an ambiguity in its sign. 
We must set up a rule for specifying the sign in any particular case. We can 
do this by arranging all the eigenvalues of a q arbitrarily in some definite order, 
say the order 


qd, q 


which may conveniently be taken the same as the order in which the n’s are 
written in (njng...|), and then requiring that the + sign shall be taken when 
the q’s in (64), which form a selection from the total set (65), can be brought 
into the order in which they appear in (65) by an even number of interchanges, 
and the — sign otherwise. 

We must now obtain the transformation law for the representative of 
a dynamical variable U, of the form of (3) of Chapter XI, from the q-representation 
to the n-representation. Following through the same method as in Chapter XI and 
transforming the equation 


Q) 3) (65) 


2 =U, (66) 


we obtain, corresponding to equation (8) of Chapter XI, 


(Tatty wes (2) So naVaa(rine Set +S 2S) tUar(rine 1Ng—1...mt1...|0), 
a a b4a 
(67) 
where (njn2...Mq — 1...n» + 1...|1) is understood to be zero if either n, — 1 
or n» +1 is not 0 or 1. With regard to the ambiguity of sign occurring in (67), 
we must take the — sign in those cases where there is a—sign in one and only one 
of the equations 


(nung... |1) = £(qiga--- dnl) (68) 
(nyng...Mg —1...mp4+1...|1) =+(qig...¢© for q®...¢an|1). 


It is easily seen that this condition for the — sign is the same as the condition that 
the number of q’s mentioned on the right-hand side of (68) that lie between q‘ 
and q® in the sequence (65) shall be odd, or that >>, n,, where the summation 
is taken over all c for which g lies between q® and q in the sequence (65), 
shall be odd. When 5°... is even, we must have the + sign. 

If we take any n, and form 1—2n,, we get an observable having as eigenvalues 1 
and —1, and thus of the same nature as the o’s dealt with in 819. Let us put 


1 — 2ng = Ora (69) 


and introduce the ozq and yq that are associated with it. Then in a representation 
with oz_ diagonal, $(02q — t7ya) and $(Ora + ty) will be represented, according 


80. The Quantization of Electron Waves 283 


(lo Go} 70) 


respectively, and will thus be to a certain extent analogous to the e””’* and e~"”« 
or to the E, and € of §62. There will be one set of Oza, Gyq and oz, for each a, 
and members of one set will commute with members of any other set. 

The form of the representatives (70) shows that when $(d%q — %0ya) 
or $(Orq + igya) is multiplied into a ~ whose representative is (njn2...Na--.|), 
the representative of the product is (njyng...mq —1...|) or (ning...Mq +1...]) 
respectively. Hence equation (67) is the representative of 


to (57) of §19, by 


Yo =< > NU agi + S- S- +U 433 (Cxa = 1 ya)a(Oz + 10 yp) V1. 


a b#a 


Since this holds whenever (66) holds, we must have 


U= > NWUVea + S- S- +3(O2a — 10 ya) Uan3(Ozp + Oyo). (71) 


a b4a 


In order to get rid of the + sign in this result, we introduce 
the dynamical variables 


& = 071079073... Ox, b-13(Txb + iyo), (72) 


where the product of o,’s consists of the o,’s corresponding to all the q’s in 
the sequence (65) up to q°-). The conjugate complex of €, is 


a = al Cis a IW ya) F210 220 23 +++Oz,a-1: (73) 


We now have for b 4 a, since the square of any 9, is unity, 


OzaFz,a+1Fz,a+2+++Fz,b-1 


Oz,a—-192z,a—-2+++Fz,b419 2b 


Eaks = 2(F2a — 1 ya) (Fx + toy), (74) 


where the first or second line in the {} brackets is to be taken, according to 
whether g comes before or after g® in the sequence (65). From (55) of 
Chapter ITI, 


( Fxg _ 1) Oa = Ora — Wya; 


O2b(Oxb =f iO yb) = Ogb + 10 yb. 


284 XIII. FIELD THEORY 


Thus the o,, and o,, factors in the {} brackets in (74) may be omitted, leaving in 
these brackets 

Oz,at1Fz,a4+2+-+Fz,b-1 

Oz,a-19z,a-2+++Fz,b+1 


f =D coe ngs se 6 


or, from (69), 


(1 = 2g) — Dig 5)) ce (1 a 2nv41) e) 
The operator in the {} brackets now commutes with $(0,5-+i0,,) and may be taken 
to the right of $(o., + ia) in (74), and when multiplied into a V represented 
by (nyn2...|), will be equivalent to the factor +1, the + or — sign being taken 
according to whether }°>,m,, summed for all c for which gq lies between gq 
and q() in the sequence (65), is even or odd. This holds in both cases—when gq 
comes before g‘) and we have to take the first line in (75), and when g comes 
after q® and we have to take the second. Thus the operator (75) is equivalent to 
the + sign in (67) and (71), and (71) reduces to 


US S- Naa + S- SE Uaak, (76) 


a b4a 


from which the ambiguity of sign has disappeared. 
We must now determine the commutability relations for the €’s and €’s. 
We shall first prove that 
babs + fna = 0. (77) 


If a and b are different, suppose gq‘ comes before q in the sequence (65). 
Then one of the factors in the expression for & given by formula (72) anticommutes 
with one of the factors in the expression for €,, namely o,, in & anticommutes 
With (Gq + t7ya) in €4, but apart from this every factor in € commutes with 
every factor in €,. Thus & must anticommute with €,. If a and b are the same, 
equation (77) states that ¢? = 0, and this holds since (Gq + idya)? = 0. 
Thus equation (77) holds generally. In a similar way we can show that 


ae mS Ex€q = 0 (78) 
and En, + €,€, = 0 for bf a. (79) 
We have further 
Cbs 4(Ona — iO ya)4(Ona = Oya) 
< 3(1 — Oza) 


=n. (80) 


80. The Quantization of Electron Waves 285 


from (69), and again 


— Nq. (81) 
Equations (80) and (81) show that 


eG; + bec =1 
and thus that (79) can be extended to 


[AS + Ge, = Onp (82) 


The quantum conditions (77), (78) and (82) are to be compared with (15) of 
Chapter XI for the Einstein-Bose case. The only difference is a change in the signs 
on the left-hand sides. 

From (80), (76) can be expressed in the form: 


VSS. 6 Ue: (83) 


a,b 


which is the same as (16) of Chapter XI. If we suppose U to be the Hamiltonian 
of the assembly of particles, we shall get as the equations of motion for the €’s, 


inka = ees i UVEa 
= So (GE Verto = £Ueokoa) 
c,b 


_ eae + E £4) Ueréo 


c,b 


from (77). This reduces, with the help of (82), to 


ihE, = S- U abe; (84) 
b 


which is of the same form as (21) of Chapter XI, and as the wave equation 
for a single one of our particles by itself, with €, playing the part of (q‘|). 
Thus our present scheme of equations may, like the scheme of 862, be considered as 
coming from a process of second quantization, the quantum conditions for which 
are (77), (78) and (82). 

We may apply the foregoing scheme to the problem of several electrons 
interacting with the electromagnetic field. We may take the operator on 


286 XIII. FIELD THEORY 


the right-hand side of (52) as the above Hamiltonian U. The equation of 
motion (84) will then be of the same form as one of the equations (57), with U 
involving only one set of space-time variables x, t and with a second quantization 
applied to V, so that its values for different values of x are not numbers, 
but operators satisfying the quantum conditions (77), (78), and (82) (the last 
of which has to be rewritten with a 6 function instead of the two-suffix 6 symbol 
since the eigenvalues of x take on continuous ranges of values). 


81. Conclusion 


The foregoing theory provides a quantum electrodynamics which is a satisfactory 
analogue of classical electrodynamics, each of the features of classical 
electrodynamics having its quantum counterpart. As a description of nature, 
though, the theory is incomplete, as it suffers from the same limitations as 
the classical theory with regard to the distribution of electric charge inside 
an electron. The quantum equations we have discussed for electrons in interaction 
with the field correspond to classical equations based on the point-charge model of 
the electron, i.e. the model in which all the charge is assumed to be concentrated at 
one point. Such a model in classical electrodynamics leads to an infinite mass for 
the electron, since the energy-density in the neighbourhood of the point where 
the charge is situated tends to infinity in a way that makes the total energy 
non-convergent. Analogously, the quantum theory that we have set up also leads 
to an infinite mass for the electron. This infinite mass here shows itself through 
the expression for dw/dt given by the wave equation (50) having an infinite value, 
owing to the non-convergence of the contributions to dw/dt arising from terms in 
the Hamiltonian corresponding to Fourier components of the field with very short 
wave-lengths. In consequence, the wave equation (50) and its equivalents (52) 
and (57) do not strictly have any solutions. 

The theory can be made to give finite and sensible answers for elementary 
problems, such as the emission and absorption of radiation whose wave-length 
is not too short, since it allows the probabilities one wishes to calculate to be 
expressed in terms of semi-convergent series or integrals, in which one can simply 
ignore the divergent part arising from the short wave-lengths. Such a procedure 
can lead to a definite answer, of course, only when the divergent part is 
clearly separated from the part we are interested in. The condition for this is 
that the important wave-lengths for the problem under consideration shall be 
long compared with the classical radius of the electron. The limitations in 
the applicability of quantum electrodynamics thus correspond precisely to those 
of classical electrodynamics. ‘The amendments required in classical theory in 
order to make it apply accurately to the elementary charged particles are thus 
not provided by the passage to the quantum theory, that is, by the taking into 


81. Conclusion 287 


considerations of the disturbances accompanying measurements. It seems that 
some essentially new physical ideas are here needed. 


288 XIII. FIELD THEORY 


Index 


action variable, 128 Ors, 46 

angle variable, 128, 129 6 function, 68 
angular momentum, 138 
anticommute, 64 

antisymmetrical wave function, 202 


eigen, 29 
eigenfunction, 95 
Einstein’s photo-electric law, 7 


bar notation, 19 Einstein-Bose statistics, 203 
basic w, 45 electric density, 244 
belonging to an eigenvalue, 29 element of a matrix, 22 
Bohr’s frequency condition, 115, 172 exclusion principle, 203 
boundary condition, 148 exclusive sets of states, 208 


bracket notation, 45 : 
Gibbs ensemble, 132 


causality, 3 

character (of a group), 208 
class of permutations, 205 
closed state, 148 
combination law, 1 
commutability relation, 83 
commute, 24 


h, h, 85 

half-width of absorption line, 197 
Hamiltonian, 111 

Heisenberg picture, 112 

— representation, 115 
Hermitian, 26 


compatible observations, 44 identical permutation, 204 
complete set of commuting observables, improper function, 68 
54 independent states, 13 
conjugate complex, 20 
— — ofa linear operator, 49 Kramers-Heisenberg dispersion formula, 
— imaginary, 20 237 


constant of the motion, 114 


contact transformation, 106 Pandey pore te 


length of a vector, 21 
de Broglie waves, 125 
degenerate system, 164 
dependent states, 13 
diagonal element, 23, 76 
— matrix, 52, 76 


magnetic anomaly of the spin, 156 
— moment of the electron, 155 
matrix, 22, 75 

mixed representative, 60 
multiplet, 176, 217 


289 


290 


non-degenerate system, 164 
normalization, 21 
— with continuous parameter, 74 


observable, 22, 34 

— having a value, 39 

— having an average value, 39 
orthogonal, 21 

orthogonality theorem, 30 


Pauli’s exclusion principle, 203 
permutations, 204 

phase factor, 21 

— space, 132 

Planck’s constant, 85 

Poisson Bracket, 84 

positive square root, 38 
positron, 260 

probability amplitude, 62 

— of [an] observable having a value, 40 
proper-energy, 174 


quantum conditions, 83 


reciprocal of an observable, 37 
— permutation, 204 
repesentation, 48 
representative, 48 


scatterer, 179 

Schrédinger picture, 112 
Schrédinger’s wave equation, 110 
second quantization, 224 
selection rule, 151 

similar permutations, 205 
Sommerfeld’s formula, 259 
spatial quantization, 144 
square root of an observable, 38 
state, 10, 16 

— of absorption, 181 

— of motion, 7 

— of polarization, 4 


INDEX 


stationary state, 111 
stimulated emmision, 172 
superposition of states, 11 
supplementary condition, 273 
symmetrical wave function, 201 


uncertainty principle, 98 
unit matrix, 48, 76 
unitary, 105 


wave equation, 110 

— function, 110 

— mechanics, 12 

— packet, 98 

weight function, 80 
well-ordered function, 108 


zero State, 226 


